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cDNA clones encoding minor ampullate spidroin proteins (MiSP) are described. The translated amino acid sequence of the cloned 
cDNA shows that the MiSPs have a structure which exhibits an amino proximal nonrepetitive region, a repetitive portion and a carboxy- 
proximal nonrepetitive portion. The repetitive portion of the sequence is describable by a generic repeat formula. Comparison of the amino 
acid sequences derived from the translation with the sequences of short peptides obtained from solubilized minor ampullate spider silk 
suggests that the nonrepetitive portions of the protein are cleaved from the protein during secretion from the cells synthesizing the spidroins. 
This comparison also suggests that the minor ampullate spider silk is composed of at least three polypeptides. 
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cDNAs Encoding Minor Ampullate Spider Silk Proteins 

RELATED APPLICATIONS 

The present application is related to copending 
application USSN 07/684,819, filed April 15, 1991, the 
entire contents of which are hereby incorporated by 
5 reference. 

FIELD OF THE INVENTION 

The present invention relates to polypeptides that 
form macroscopic fibers and to cloned DNA encoding such 
polypeptides . 

10 The proteins are some of those which constitute silks 
made by spiders. Preferred embodiments of the present 
invention are those silk proteins made in the minor 
ampullate glands of the spider Nephila clavipes. The 
silks of the present invention also encompass fibers 

15 made from synthetic polypeptides of amino acid sequences 
derivable from the amino acid sequence of the N. 
clavipes ampullate silks or made from polypeptides 
expressed from cloned DNA obtained from a library of 
spider complementary or genomic DNA. 

2 0 BACKGROUND OF THE INVENTION 

The orb web spiders (Nephila) possess six types of 
silk synthetic glands, two of which are the major and 
minor ampullate organs. The major and minor ampullate 
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silks are distinguishable by their physical and chemical 
properties . 

The major ampullate (dragline) silk possesses 
unique physical properties, combining high tensile 
5 strength and substantial elasticity [Denny, M.W. J, Exp. 
Biol . , 65, 483-506 (1976); Lucas, F. Discovery , 25, 20- 
26 (1964)] . Previous investigations suggest that spider 
silk is composed of a single large protein, primarily 
containing pseudo-crystalline regions of stack fi-pleated 
10 sheet alternating with amorphous domains, [Warwicker, 
J.O., J.Mol .Biol . . 2, 350-362 (1960); Lucase, F. et al, 
J . Text Inst . , 46, T440-T452 (1985); Hepburn, H.R. et 
al., Insect Biochem. . 9, 69-71 (1979)]. 

In fact, the major ampullate silk of Nephila 
15 clavipes was found to be composed of a composite of two 
proteins. cDNA clones encoding both of the proteins 
comprising the major ampullate silk are described in 
copending application USSN 07/684,819. We describe 
■ herein the isolation and characterization of cDNA clones 
20 encoding proteins composing minor ampullate silk. 

SUMMARY OF THE INVENTION 

Spider silk is composed of fibers formed from 
proteins. We have found that natural spider silk fibers 
are composites of two or more proteins. However, it is 
possible to form fibers from a single spider silk 
protein. In general, spider silk proteins are found to 
have primary amino acid sequences that can be 
characterized as indirect repeats of a short consensus 
sequence. Variation in the consensus sequence is then 
responsible for the distinguishable properties of the 
different silks proteins. 

Furthermore, silk fibers can be made from synthetic 
polypeptides having amino acid sequences substantially 
similar to the consensus repeat unit of a silk protein 
or from polypeptides expressed from cloned DNA encoding 
a natural or engineered silk protein. 



25 



30 



35 
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Thus, it is one object of the present invention to 
provide cloned DNA which encodes a spider silk protein. 
The cloned DNA is preferably obtained from an orb web 
spider (Nephila) . Cloned cDNA from the minor ampullate 
5 gland of Nephila clavipes is described in detail below. 

Naturally occurring spider silk proteins have an 
imperfectly repetitive structure. However, the 

imperfection in the repetition is likely to be a 
consequence of the process by which the silk protein 

10 genes evolved, rather than a requirement for fiber 
formation. The imperfection in repetition is thus 
likely to only subtly affect the characteristics of the 
fibers which form from the aggregation of the protein 
molecules. Accordingly, it is a second object of the 

15 present invention to provide cloned DNA encoding an 
engineered spider silk protein comprising a polypeptide 
having direct repeats of a unit amino acid sequence. 
Alternatively, the cDNA may include several different 
unit amino acid sequences to form a "copolymer" silk 

20 protein. 

It is a third object of the invention to provide a 
spider silk protein expressed from a cloned DNA, wherein 
the cloned DNA is either one obtained from a spider 
ampullate gland cDNA, a genomic DNA, or synthetic DNA. 

25 Finally, it is an additional object of the present 

invention to provide fibers made from silk protein 
obtained by expression of cloned DNA. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1A-1F shows the nucleotide and the amino 
30 acid sequence translation of the insert from pMISSl. 

Figure 2A-2D shows the nucleotide and the amino 
acid sequence translation of the portions of the insert 
from pMISS2 that have been sequenced. 2 A shows 309 
nucleotides at the 5' end of pMISS2. 2B shows 165 
35 nucleotides of the PstI fragment 4 (see Figure 4) . 2C, 
2D show the 870 nucleotides at the 3 ' end of the insert 
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in pMISS2. 

Figure 3A-3C shows the nucleotide and the amino 
acid sequence translation of the portions of the inserts 
from the 11-1 and 11-2 clones (pMISS3) that have been 
sequenced. 3A shows 165 nucleotides from the forward 
primer of the 11-1 clone. 3B shows 240 nucleotides from 
the reverse primer of the 11-1 clone. 3C shows 146 
nucleotides from the forward primer of the 11-2 clone. 

Figure 4 shows the alignment of the amino acid 
sequences of the nonrepetitive regions of MiSPl and 
MiSP2 . 

Figure 5 shows a restriction map of the pMISSl 
insert cDNA. 

Figure 6 shows a restriction map of the pMISS2 
15 insert cDNA. Beneath the restriction map is a schematic 
showing the portions of the insert that have been 
sequenced. 

Figure 7 shows a flow chart description of the 
synthesis of the pET19b-16 vector. Restriction sites 
20 are designated as: B, Bsp El; E, Eco RV; S, Sea I; X, 
Xma I. 

Figure 8A-8B shows analysis of the purification of 
a synthetic spider silk protein expressed from the 
pET19b-16 vector. 8A shows analysis of the crude lysate 
25 at 1, 2 and 4 hours post - induction . 8B shows analysis 
of the protein purified by Ni 2+ affinity purification. 

DETAILED DESCRIPTION OF THE INVENTION 

Studies in our laboratory have established that the 

major ampullate silk is composed of two distinct 
30 proteins. The major ampullate silk proteins possess the 

secondary structure predicted by Warwicker and others. 

The primary structure of the major ampullate silk 

proteins is characterized by indirect repeat of a 

discrete repeat unit. The sequence of the repeat unit 
35 is different for each of the proteins comprising the 

major ampullate silk. 
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The Nephila minor ampullate silk can be 
distinguished from the Nephila major ampullate silk by 
both physical and chemical properties. In contrast to 
the elasticity exhibited by the major ampullate silk, 
5 the minor ampullate silk is observed to yield without 
recoil. The minor silk will stretch about 25% of its 
initial length before breaking, exhibiting a tensile 
strength of nearly 100,000 psi. The amino acid 
composition of solubilized minor ampullate silk also 

10 differs from that of solubilized major ampullate silk. 

Like the major ampullate silk proteins (major 
spidroin 1, MaSPl; major spidroin 2, MaSP2) , the 
proteins comprising minor ampullate silk are found to 
have a primary structure dominated by imperfect 

15 repetition of a short sequence of amino acids. A "unit 
repeat" constitutes one such short sequence. Thus, the 
primary structure of the spider silk proteins is 
considered to consist mostly of a series of small 
variations of a unit repeat. The unit repeats in the 

20 naturally occurring proteins are often distinct from 
each other. That is, there is little or no exact 
duplication of the unit repeats along the length of the 
protein. However, synthetic spider silks can be made 
wherein the primary structure of the protein can be 

25 described as a number of exact repetitions of a single 
unit repeat. Additional synthetic spider silks can be 
described as a number of repetitions of one unit repeat 
together with a number of repetitions of a second unit 
repeat. Such a structure would be similar to a typical 

3 0 block copolymer. Of course, unit repeats of several 
different sequences can also be combined. 

An alternative way to describe the primary 
structure of spider silk proteins is to consider a 
"consensus" sequence that is derived from an alignment 

35 of the unit repeats. Such a consensus sequence is the 
length of most of the unit repeats and accounts for the 
variation at each position of the unit repeat by 
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including the residue most common at each position. For 
the MaSP2 protein, the consensus sequence derived is 
GPGQQGPGGYGPGQQGPSGPGSAAAAAAAAAAGPGGY (see Table 2) . 

Cloned DNA of the present invention includes 
5 sequences shown in Figures 1A-1F, 2A-2C and 3A-3C. The 
cloned DNA of the present invention also includes DNA 
molecules made from Nephila DNA or RNA templates by PCR 
or the like, using primers made from sequences shown in 
Figures 1A-1F, 2A-2C and 3A-3C. Finally, cloned DNA of 

10 the present invention also encompasses polynucleotides 
which can hybridize to DNA having sequences shown in 
Figures 1A-1F, 2A-2C and 3A-3C under hybridization 
conditions typically used for library screening and 
Southern blotting. Preferably such hybridization 

15 conditions are those obtained by a solution of 6X SSC or 
SSPE, 5X Denhardt's solution, 0.5% SDS at a temperature 
of about 68°C, or those obtained by the same solution 
that is also 50% in formamide at a temperature of about 
42°C. Alternatively, the hybridization conditions are 

20 those wherein the temperature is about 15-20°C below the 
T m calculated for the solution conditions. [See, J. 
Sambrook et al., Molecular Cloning; A Laboratory Manual . 
2nd ed., pp. 9.47 - 9.58, c. 1989 by Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, NY.] . 

25 The polypeptides of the present invention can be 

made by direct synthesis or by expression from cloned 
DNA. The means for expressing cloned DNA are generally 
known in the art. However, there are some 

considerations for design of expression vectors that are 

3 0 unusual for expressing DNA encoding the spider silk 
proteins of the present invention. 

First, the proteins are highly repetitive in their 
structure. Accordingly, cloned DNA should be propagated 
and expressed in host cell strains that will maintain 

35 repetitive sequences in extrachromosomal elements (e.g. 
SURE™ cells, Stratagene) . Also, due to the high content 
of alanine, glycine, proline, and glutamine, it might be 
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advantageous to use a host cell which overexpresses tRNA 
for these amino acids. 

The proteins of the present invention can otherwise 
be expressed using vectors providing for high level 
5 transcription, fusion proteins allowing affinity 
purification through an epitope tag, and the like. The 
hosts can be either bacterial or eukaryotic. It is 
considered that yeast, especially Saccharomyces 
cerevisiaae, or insect cells might be advantageous 
10 eukaryotic hosts. 

Fibrillar aggregates will form by spontaneous self- 
assembly of spider silk proteins when the protein 
concentration exceeds a critical value. The aggregates 
can be gathered and mechanically spun into macroscopic 
15 fibers according to the method of O'Brien et al. [I. 
O'Brien et al., "Design, Synthesis and Fabrication of 
Novel Self -Assembling Fibrillar Proteins", in Silk 
Polymers: Materials Science and Biot echnology , pp. 104 - 
117, Kaplan, Adams, Farmer and Viney, eds., c. 1994 by 
20 American Chemical Society, Washington, D.C.] . 

The following examples are provided to illustrate 
the invention in more detail. The examples are not to 
be taken as limiting the invention, the scope of which 
is rather defined by the claims following. 

25 Example I: cDNA Clones Encoding Minor 

Ampullate Silk Proteins 
The minor ampullate glands are small, J-shaped 
organs located in the abdomen of the spider. The minor 
ampullate glands (about 20) were removed from a number 

30 of spiders and frozen in liquid nitrogen. Total RNA was 
prepared from the frozen tissue by standard methods. 
cDNA was prepared from the total RNA using the 
RIBOCLONE™ system (Promega) . The synthesis method was 
modified slightly by using pseudorandom hexamers in 

35 addition to the NotI primer- adapter in the primer 
extension steps. The pseudorandom hexamers were 
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synthesized having the sequence (A or T) (G or C) (G or 
C) (A or T) (G or C) (G or C) . Such hexamers reflect the 
sequence bias in the minor ampullate silk proteins 
(minor spidroins, MiSP) we hypothesized would be imposed 
5 by repetition of alanine and glycine residues , which are 
found in large proportion in the amino acid composition 
of solubilized minor ampullate silk. We anticipated 
that so biasing the primer composition would enrich the 
library in long cDNAs encoding MiSP proteins. 

10 The cDNA thus synthesized was ligated to 

appropriately digested pGEM3Zf (-) plasmid (Promega) and 
the ligation mixture was used to transform SURE™ £. coli 
cells (Stratagene) - Plasmid DNA was prepared from 

randomly selected transformed colonies and the insert 

15 DNA was partially sequenced, using the forward and 
reverse primers provided by the supplier (Promega) , that 
are complementary to the vector sequence near the 
insert. Clones having inserts encoding highly 

repetitive sequences were examined in greater detail 

20 with respect to insert size. Clones having an insert 
size greater than 1.5 kbp were sequenced in their 
entirety. 

The entire insert of the pMISSl (encoding MiSPl) 
has been sequenced. The nucleotide sequence and the 

25 resulting translation are shown in Figure 1. A 
restriction map is shown as Figure 5. The region from 
nucleotides 96-137 is represented as indeterminate. That 
portion of the cDNA is found to have a much higher GC 
content than . the remainder of the sequence . As a 

3 0 result, that portion of the nucleotide sequence has not 
been resolved due to "compression" observed in the 
electrophoresis step. pMISSl contains an open reading 
frame beginning with the ATG start codon at nucleotides 
183-185. The open reading frame encodes a 5'- 

35 nonrepetitive region, an indirect repetitive region and 
a 3' -nonrepetitive region. The 5' -nonrepetitive region 
contains a sequence *of about 16 residues (amino acids 
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2-17) that conforms to secretion signal sequences. The 
presence of the leader peptide suggests that the MiSPl 
protein is processed and secreted through the 
endoplasmic reticulum. 

Table 1 shows the MiSPl amino acid sequence 
formatted to show the 13 unit repeats of the MiSPl 
protein. 
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Table 1 

Minor Ampullate Spidroin 1 Residues 92-706, showing 
alignment of unit repeats: 

5 

GAAGAGGYGRGAG GYGGQGGYGAGAGAGAAAAA 

GAGAGGAGGYGRGAGAGAGAAAGAGAGAGGAGYGGQGGYGAGAGAGAAAAA 

GAGAGGAGGYGRGAGAGAGAAAGAGA GGYGGQGGYGAGAGAGAAAAAA 

GAGSGGAGGYGRGAGAGAGAAAGAGAGA- - GS YGGQGGYGAGAGAGAAAAA 

10 GAGAGGAGGYGRGAGAGAGAGAGAAARAGAGAGG AAAAA 

GAGAGGAGGYGRGAGAGAGAAAGAGAGA GGYGGQSGYGAGAG - -AAAAA 

GAGAGGAGGYGRGAGAGAGAAAGAGAGAAAGAGAGGYGGQGGYGAGAGAGAAAAA 
GAGAGGAGGYGRGAGAGAGAAAGAGAG GYGGQGGYGAGAGAGAAAAA 

- TGAGGAGGYGRGAGAGAGAAAGAGAGTGGAGYGGQGGYGAGAGAGAAAAA 

1 5 GAGAGGAG - YGRGAGAGAGAAAGAGAGAAAGAGAGAGGYGGQGGYGAGARAGAAAAA 
GAGAGGAAGYSRGGRAGAAGAGAGAAAGAGAGAGGYGGQGGYGAGAGAGAAAAA 
GAGSGGAGGYGRGAGAGAAAGAGAAAGAGAGAGGYGGQGGYGAGAGAAAAA 
GAGAGRGGYGRGAGAGGYGGQGGYGAGAGAGAAAAA 

- added for purposes of alignment 
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Each repeat is a variation of the consensus amino 
acid sequence 

RGAAGAAGAGAGAAAGAGAGAGAGGYGGQGGYGAGAGAGAAAAAGAGAGGAGGYG . 
This repetitive region can be described as a mixture of 
5 two types of units, (1) dimers of alanine separated by 
glycine residues, and (2) dimers of glycine separated by 
tyrosine or glutamine residues. It is thus 

distinguishable from the consensus sequence of the MaSP2 
protein, which can be characterized as predominantly 

10 dimers of glycine or glutamine separated by proline or 
tyrosine residues. 

Alternatively, the majority of the amino acid 
sequence of the MiSPl protein can be described by a 
repeat unit having the generic formula: 

15 (GR)(GA),(A) m (GGX) n (GA) l (A) m 

where X is tyrosine', glutamine or alanine and 
where 1=1 to 6, m=0to4 and n = 1 to 4 . 
This finding is similar to what was observed for the 
MaSPl and MaSP2 proteins, which exhibit the generic 
20 formulas: 

MaSPl : 

(XGG) w (XGA)(GXG) x (AGA)y(G) z AG 

where X is tyrosine or glutamine 

and where w = 2-3, x = 1-3, y = 5-7, and 2 = 1 or 
25 2. 

MaSP2 : 

(GPG 2 YGPGQ 2 ) a (X) 2 S(A) b 

where X = GPG or GPS 

and where a = 2 or 3 and b = 7 to 10 . 

30 Inspection of the amino acid sequence of MiSPl 

shows that, for the most part, the protein can be viewed 
as a derivatized polyamide. Accordingly, polypeptide 
having the less complex generic formula: 
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(GGX) n (GA) m (A) 1 

where X is tyrosine, glutamine or alanine and 
where 1=1 to 6, m = 0 to 4 and n = 1 to 4, 

would also be expected to have many of the properties of 
5 the MiSPl protein. 

The 3 ' -nonrepetitive coding region of pMISSl 
encodes a 96 amino acid spider silk consensus sequence 
that is 50% and 49% identical to the 3 ' -nonrepetitive 
regions of MaSPl and MaSP2, respectively. The coding 

10 region ends at nucleotide position 2634 with a TAA stop 
codon. The 3' untranslated region of pMISSl contains a 
poly (A) tail . 

The majority of the pMISS2 (encoding MiSP2) cDNA 
has been sequenced. The insert in pMISS2 is 1.6 kbp in 

15 length, of which 1344 nucleotides have been determined. 
The nucleotide sequence and translation of the completed 
portions of the DNA sequence are shown in Figure 2. 
Figure 6 shows a restriction map of the pMISS2 clone and 
indicates what portions of the cDNA insert have been 

20 sequenced. pMISS2 contains an open reading frame 
beginning at the 5' end of the insert that does not 
begin with a methionine. This result strongly suggests 
that the pMISS2 cDNA lacks nucleotides encoding the 
amino terminus of the MiSP2 protein. The pMISS2 cDNA, 

25 like the pMISSl cDNA encodes a 5 ' -nonrepetitive region, 
a repetitive region and a 3 ' -nonrepetitive region. The 
5'- and 3'- nonrepetitive regions of MiSPl and MiSP2 are 
aligned in Figure 4. In contrast to MiSPl, the unit 
repeat that characterizes the repetitive region in MiSP2 

30 is cryptic. As no clear unit repeat is yet 
distinguishable, no consensus repeat unit is yet 
derived. However, it is clear from inspection of the 
repetitive portion of MiSP2 that it is distinguishable 
from the repetitive portion of MiSPl. 

35 Another pair of clones, designated 11-1 and 11-2, 

respectively (collectively pMISS3) , are independent 
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isolates of the same cDNA and are found to encode a 
third minor ampullate silk polypeptide (MiSP3) . 11-1 
contains a 2 kbp insert; 11-2 contains a 1.5 kbp insert. 
Partial nucleotide sequences have been obtained from 
5 both of these clones to date. The nucleotide sequences 
and translations thereof are presented as Figures 3A-3C. 

Three different types of N-bromosuccinimide (NBS) 
peptides from minor ampullate silk have been purified. 
The first type of peptide has the amino acid sequence 

10 GGQGGY. The second type of peptides have a sequence 
encompassed by the generic formula (GA) n/ where n=3.5, 
4.5, or 8.5. The third type of peptides have the 
sequence (G) n/ where n=6 or 9. The pMISSl, pMISS2, and 
pMISS3 clones all encode the GGQGGY peptide and some 

15 variation of the (GA) n peptide. However, none of the 
isolated cDNAs , so far as they have been characterized 
to date, encode a (G) n peptide. Since pMISSl has been 
completely sequenced, except for a small region of 42 
nucleotides in a highly compressed region (high GC 

20 content) and does not contain the (G) n peptide, the minor 
ampullate silk must contain at least two proteins. 
Furthermore, while portions of the nonrepetitive 
regions of MiSP2 are identical to parts of the 
nonrepetitive regions MiSPl, the nonrepetitive regions 

25 of the two proteins are different. Also, the repetitive 
regions are different of MiSPl and MiSP2 are 
distinguishable (see below) . Although nonrepetitive 
portions have not yet been found in MiSP3, the repeats 
encoded by the 11-series isolates are distinguishable 

3 0 from the repeats of both MiSPl and MiSP2 on two bases: 
(1) the spacing between Gin residues is only about one- 
half that seen in MiSPl and MiSP2, and (2) Phe residues 
occasionally precede the GGQGGY sequence whereas a Tyr 
always precedes the GGQGGY sequence in MiSPl. Thus, 

35 the minor ampullate gland produces a silk comprised of 
at least three proteins. 
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Example 2: Expression of a cDNA Encoding a Polypeptide 
Comprising the MaSP2 consensus sequence 
In order to demonstrate expression of an engineered 
spider silk protein, the consensus sequence from the 
5 MaSP2 protein (USSN 07/684,819 was cloned into an E. 
coli expression vector. The consensus sequence was 
determined, using the considerations described above, 
from the alignment of the unit repeats of the MaSP2 
protein. Table 2 shows the alignment of the unit 
10 repeats of the MaSP2 protein. 



Table 2 



Alignment of Unit Repeats of the MaSP2 Protein 



GPGQQGPGGYGPGQQGP - - SGPGSAAAAAAAAAA GPGGYGPGQQGPGGY 

15 GPGQQGPGRYGPGQQGP - -SGPGSAAAAAA GSGQQGPGGY 

GPRQQGPGGYGQGQQGP - - SGPGS AAAASAAASAESGQQGPGGYGPGQQGPGGY 

GPGQQGPGGYGPGQQGP - - SGPGS AAAAAAAS GPGQQGPGGY 

GPGQQGPGGYGPGQQGP - - SGPGSAAAAAAAAS GPGQQGPGGY 

GPGQQGPGGYGPGQQGL- - SGPGS AAAAAAA 

2 0 GPGQQGPGGYGPGQQGP- - SGPGS AAAAAAAAA GPGGY 

GPGQQGPGGYGPGQQGP - - SGAGSAAAAAAA GPGQQGLGG Y 

GPGQQGPGGYGPGQQGPGGYGPGSASAAAAAA 

GPGQQGPGGYGPGQQGP - - SGPGSASAAAAAAAA GPGGY 

GPGQQGPGGYAPGQQGP- - SGPGSASAAAAAAAA GPGGY 

2 5 GPGQQGPGGYAPGQQGP - - SGPGS AAAAAAAS A GPGGY 
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The synthesis of the expression vector is described 
below and shown schematically in Figure 7. 

Two synthetic oligonucleotides were synthesized: 

1) an 84 base oligonucleotide, named S2long 

5 5' - TCTAGCCCGGGTGGCTATGGTCCTGGACAGCAAGGTCCTGGCGGTTACGGTCC 
TGGCCAACAGGGTCCCTCTGGTCCAGGCAGT-3 ' 

2) a 59 base oligonucleotide, named S2short 
5 ' -TCCGGACCTGCTGCGGCGGCTGCGGCAGCTGCACTGCC 

TGGACCAGAGGGACCCTGTTG - 3' 

10 These oligonucleotides were designed to hybridize 

to each other in a 27 base region of complementarity, on 
the 3' end of each respective oligonucleotide. When the 
rest of the bases were filled in by VENT™ polymerase 
(New England Biolabs) and the product digested with Xma 

15 I (recognition site-CCCGGG) , a double- stranded segment 
of DNA resulted which encoded the basic repetitive unit 
of MaSP2 (in single letter amino acid code) : 

PGGYGPGQQGPGGYGPGQQGPSGPGSAAAAAAAAG 

The DNA segment, with an Xma I cut on the 5' end (with 
20 respect to the coding strand) and the other end blunt 
but containing a Bsp EI site, was ligated into 
pBLUESCRIPT™ II (Stratagene) which had been double 
digested with Xma I and Eco RV and agarose gel purified, 
thus giving a directional cloning with the inserted 
25 segment in frame with the lac I gene of pBLUESCRIPT™ II. 
It is important to note for the strategy explained later 
that Xma I and Bsp E I have compatible, nonregenerable 
overlaps. That is, DNA cut with these enzymes can be 
ligated, but the ligation will not regenerate either 
30 site. The ligated DNA was subjected to Eco RI digestion 
to reduce background (the Xma I, Eco RV digest of the 
vector eliminated the unique Eco RI site of pBLUESCRIPT™ 
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II) and used to transform competent SURE™ E. coli cells 
(Stratagene) . 

Twelve white colonies (indicating inserts were 
present in the plasmid) resulted which were screened by 
5 digesting plasmid DNA obtained from the colonies 
(SCREENMAX™, J.T. Baker) with BssHII to release the 
insert. The insert sizes were determined by agarose gel 
electrophoresis . 

Four colonies contained inserts of the predicted 

10 size. Plasmid DNA was prepared from those colonies by 
SCREENMAX™ and subjected to sequencing. One colony 
harbored a plasmid (hereafter referred to as pS2U) 
containing an insert that was usable, although its 
structure was not exactly as designed. The ninth base of 

15 S2short was changed to a G, most likely a result of a 
synthesis error, although the difference may also have 
been a mistake incorporated by the polymerase or a 
mutation occurring during the cloning manipulations. In 
addition, the first base of S2short is missing (or the 

20 first base of the Eco RV site, it is impossible to 
determine which) . This could be due to nonspecific 
nuclease activity in restriction enzymes used to perform 
the recombinant DNA manipulations. However, these 
changes are not critical, since the G appears in a 

25 wobble position in the coding sequence, and the 
alteration of the blunt end ligation site may even have 
provided some advantages, putting several codons for 
arginine directly after the MiSP2 sequence. 

The insert was doubled, except for the additional 

30 arginine encoding codons, by manipulation of the 
restriction sites imbedded by design at the ends of the 
unit consensus sequence as well as a unique Sea I site 
in the ampicillin resistance gene of pBLUESCRIPT™ II 
(See Figure 7) . Plasmid from a miniprep is digested 

35 with Sea I, then divided into two aliquots. One aliquot 
is digested with Xma I and the other with Bsp E I. The 
digests are electrophoresed on 0.8% soft agarose, and 
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the appropriate bands excised with a razor blade, and 
the DNA extracted using the standard procedure provided 
with 0-agarase (New England Biolabs) . The Sea I-Xma I 
segment containing one copy of the unit is then ligated 
5 to the Sea I-Bsp EI segment also containing one copy of 
the unit, thus effectively doubling the insert size 
while keeping both units in frame and regenerating the 
ampicillin resistance. This strategy can be repeated to 
derive any number of repeats of the unit desired (until 

10 secondary structure or insert size interferes) . Thus an 
engineered vector encoding a polypeptide comprising 16 
repeats of the MaSP2 consensus sequence was constructed 
in pBLUESCRIPT™ II. 

The insert encoding 16 repeating units of the MaSP2 

15 consensus sequence was placed in pET19b by cutting the 
Hindi site of pBLUESCRIPT™ (creating a blunt end) then 
ligating a Bam HI linker of the appropriate size to that 
end. The fragment was then subjected to Bam HI cleavage, 
which cut at both ends, due to the presence of a Bam HI 

20 site in pBLUESCRIPT™ a few bases 5' of the insert. This 
5' Bam HI site was engineered to be in frame with the 
Bam HI insertion site of the pET system of vectors 
(Novagen) . As noted below, the pET vector system allows 
affinity purification of expressed proteins using 

25 affinity recognition of a polyhistidine leader sequence 
attached to the desired protein. The insert was agarose 
gel purified, ligated into Bam Hl-cut, phosphatased 
pET19b and the result used to transform competent SURE™ 
E. coli (Stratagene) . The resultant colonies were 

30 screened and the orientation of the inserts determined 
by restriction digest. Clones with properly oriented 
inserts were then used for expression experiments. 

BL31 DE3 E. coli (Novagen) were transformed with a 
plasmid having the insert in the desired orientation 

35 (pET19b-16) and plated on LB agarose plates containing 
chloramphenicol and carbenicillin. Antibiotic resistant 
colonies were picked an grown in LB medium containing 
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chloramphenicol and carbenicillin to an OD^ of about 
0.8. One mL of the resulting inoculum was saved as a 
freezer stock. Inoculum cultures should be grown to OD^ 
of 0.8 or less, in order to maintain antibiotic 
5 selection pressure. 

Five mL of the inoculum was used to inoculate 50 mL 
of LB containing the antibiotics. When the OD^ reached 
0.8, the cells were collected by centrif ugation and 
resuspended in 50 mL of fresh medium. The resuspended 
10 culture was diluted into 500 mL of LB containing the 
antibiotices and culture was continued until the OD^ 
reached 0.8. IPTG was added to a concentration of 0.8 
mM to initiate expression of the synthetic spider silk 
gene. 

15 After four hours, the cells were collected by 

centrifugation and resuspended in a lysis buffer 
modified from the method of Sambrook et al . (50 mM Tris- 
Cl (pH 8.0) , 10 mM MgCl 2 , 100 mM NaCl) , and lysed with 
lysozyme in the presence of PMSF according to Sambrook 

20 et al. [J. Sambrook et al., Molecular Cloning: A 
Laboratory Manual. 2nd ed. , pp. 17.23-17.44, c. 1989 by 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY.] . 

The MaSP2 consensus polypeptide was purified from 
25 the lysate by affinity purification using a Ni 2+ column, 
as described by the technical manual provided by the 
manufacturer (Novagen) . The divalent metal complexes 
the polyhistidine leader sequence encoded by the pET 
vector. A single step affinity purification provided 
30 the desired fusion protein at 95% purity. 

For cleavage of the polyhistidine leader peptide, 
the eluant from the affinity column was dialysed against 
distilled water for 24 hours to remove salts. The 
solution was made to 25 mM in ammonium bicarbonate and 
35 TPCK-treated trypsin was added to 1/20 the amount of the 
protein content of the eluate by weight. The digestion 
reaction was incubated at 37°C for 4 hours. An 
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additional aliquot of trypsin was added and the 
incubation was continued for an additional 4 hours. The 
leader peptide fragment was separated from the synthetic 
spider silk polypeptide by gel filtration chromatography 
5 on SEPHADEX™ G-50. Figure 8 shows the results obtained 
using the above -described system. Approximately 10 mg 
of the MiSP2 consensus polypeptide are obtained from a 
500 mL culture. The molecular weight of 58 kDa is the 
expected molecular weight for the polypeptide having a 
10 sequence of 16 repeats of the MiSP2 consensus sequence. 

Example 3: A Generalized Method for Preparing Vectors 
for Expression of Spider Silk Protein Consensus 

Polypeptides 

Following is a general method for generating 
15 artificial genes for any repetitive protein that 
contains polyalanine stretches. The method can thus be 
applied to express a protein comprising the consensus 
polypeptide of any of the major or minor ampullate 
spidroin proteins described herein. 
20 The method employs two particular restriction 

enzymes, Sfi I and AlwN I (recognition sites shown 
below) : 

Sfi I : GGCCNNNN/NGGCC AlwN I : CAGNNN/CTG 
An oligomer is designed such that a Bam HI site is in 

25 frame with and immediately precedes an Sfi I site. The 
Bam H I site will also be in frame with the pET system 
of vectors which are used for expression. However, the 
manipulations which are needed to produce multiple 
copies of the artificial unit will not involve this 

30 site, since it is 5' to the Sfi I site. Sfi I and AlwN 
I are used as the primary enzymes for manipulations for 
unit multiplication because the recognition sequences of 
both of these enzymes can (1) code for polyalanine 
stretches (see below) and (2) can form a pair of 

35 compatible, nonregenerable sites. 
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Ala Ala Ala Ala Ala Ala Ala 

Sfi I : G/GCC/GCA/GCG/GCC AlwN I : GCA/GCA/GCT 

Two oligonucleotides are designed that will reverse 
complement each other on their 3' ends, allowing 
5 hybridization. The first contains the Bam HI site, 
followed (in frame) by the Sfi I site representing the 
polyalanine region of MiSPl, followed by DNA encoding 
approximately two-thirds of the repetitive portion of 
MiSPl. The second oligonucleotide will be the 
10 ant i- coding strand of MiSPl, starting with an AlwN I 
site and encoding approximately two- thirds of the 
repetitive region. 

The simple diagram below shows the intended overlap 
of the the oligonucleotides and the placement of the 
15 restriction enzymes sites. 

5' B-S 

A 5 , 

After hybridization, the overhanging ends are filled 
with VENT™ polymerase. The resultant double -stranded 
product is digested with Bam H I and, after agarose gel 
purification, cloned into a Bam H I cut, Eco R V cut 
pBLUESCRIPT™ II vector. This ligation mixture is 
digested with Eco R I (to reduce background) and used to 
transform competent SURE™ E. coli cells. Plasmid DNA is 
prepared from resulting colonies and screened first for 
insert size, then sequenced to determine if the insert 
is properly integrated. 

To double the insert to appropriate size, double 
digests with Sea I (found in the Amp r gene of 
BLUESCRIPT™) and either Sfi I or AlwN I are performed 
and the resultant fragments gel purified. The 5' Sea 
I -Sfi I -AlwN I 3' fragment of the Sea I + AlwN I digest 
is ligated to the 5' Sfi I -AlwN I-Sca I 3' fragment 
from the Sea I + Sfi I digest. This will regenerate a 
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functional pBLUESCRIPT™ II which will include a doubled 
artificial gene. Since Sfi I and AlwN I ends are 
compatible they will ligate, but the resulting splice 
site will not regenerate a recognition site for either 
5 enzyme. This allows the doubling to be extended to 4-, 
8-, 16-, and higher multimers of the original insert. 

The final vector+multimer can then be cut with Hinc 
II, ligated with Bam H I linkers of an appropriate 
length, cut with Bam H I to liberate the insert, and 
10 cloned directly into the pET system of vectors for 
expression. 

Example 4 : Optimization of Expression of DNA 

Encoding Spider Silk Proteins 
In order to increase the yield of spider silk 

15 proteins expressed from cloned DNA in bacteria, the 
above -described culture methods can be modified. In 
particular, due to the large proportion of glycine, 
alanine, glutamine and proline in the proteins, 
supplementation of the culture medium used to grow cells 

20 for expression with these amino acids is expected to 
allow increased yield of the spider silk protein. Also, 
the culture density can be increased by use of high- 
density fermentation methods standard in the art [See, 
e.g. Reisenburg et al., Applied Microbiology and 

25 Biotechnology 34=77 (1990); Alberghina et al . , Applied 
Microbiology and Biotechnology 34=82 (1990)]. For 
instance, increasing the OD m at which expression is 
initiated from 0.8 to 20 would be expected to produce a 
concomittant increase in yield from 20 mg/L to 480 mg/L. 

30 The vector used to support replication of the cloned 

DNA and to drive its expression can also be changed. 
The basic pET system described above is available from 
the supplier (Novagen) in many variations. One 
characteristic which makes the pET system advantageous 

35 is that expression of inserts in the pET vectors is very 
tightly regulated. Very little of. the cloned DNA is 
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expressed until transcription of the insert DNA is 
induced. When transcription is induced, additional 
elements of the pET vector inhibit production of host 
cell proteins, thereby putting most of the protein 
5 synthetic resources of the cell to work to make protein 
encoded by the insert DNA. 

However, the use of chloramphenicol and 
carbenicillin resistance to provide selection pressure 
is disadvantageous for high-level expression of 
10 proteins. Accordingly, use of a different antibiotic 
selection, e.g. kanamycin resistance, is expected to 
provide increased yields of protein by expression of DNA 
cloned in pET vectors. 

Another advantage of the system used in the present 
15 case is that the polyhistidine leader peptide provides 
an affinity purification method that can be used even in 
the presence of chaotropic agents. This would allow 
purification of spider silk proteins fused to such a 
polyhistidine sequence which might be made in "inclusion 
20 bodies", aggregates of insoluble protein, that require 
harsh solubilization procedures prior to purification. 

The host cell strain used for expression can also 
be optimized. Cells having a high level of tRNA for 
Ala, Gin, Gly and Pro codons could be made and used for 
25 expression of spider silk proteins. Also, the cellular 
protease complement of the cells can be manipulated to 
minimize degradation of the expressed protein. 

It is considered that the spider silk proteins of 
the present invention can be expressed in appropriately 
30 engineered insect cells, using commonly available 
baculovirus vectors. 

Example 5: Preparation of Fibers From 
Spider Silk Proteins 
As noted above, the spider silk proteins can be 
35 viewed as derivatized polyamides. Accordingly, the 
methods for producing fiber from soluble spider silk 
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proteins is similar to that used to produce typical 
polyamide fibers, e.g. nylons, and the like. 

O'Brien et al . [supra] describe fiber production 
from adenovirus fiber proteins. In a typical fiber 
5 production, the spider silk proteins are solubilized in 
a strongly polar solvent. The protein solution is 
typically greater than 5% in protein concentration. The 
solution is preferably between 8 and 20% in protein. 

Fibers are preferably spun from solutions 

10 demonstrating properties indicating a liquid crystal 
phase. The concentration at which the phase transition 
will occur is different for particular polypeptide 
compositions. However, the phase transition can be 
monitored by observing the clarity and birefrigence of 

15 the solution. Onset of the a liquid crystal phase is 
detected by a translucent appearance of the solution and 
the observation of birefringence when the solution is 
viewed through crossed polarizing filters. 

The solvent used to dissolve the spider silk protein 

20 is preferably highly polar. Such solvents are 
exemplified by di- and tri- haloacetic acids, 
haloalcohols (e.g. hexaf luoroisopropanol) . In some 
instances, co-solvents such as acetone are useful. 
Also, solutions of chaotropic agents, such as lithium 

25 thiocyanate, guanadine thiocyanate or urea can be used. 

In one fiber- forming technique, fibers are first 
extruded from the protein solution through an orifice 
into methanol, until a length sufficient to be picked up 
by a mechanical means is produced. Then the fiber is 

30 pulled by such mechanical means through the methanol 
solution, collected and dried. The methods for drawing 
fibers are considered well-known in the art. Fibers 
made from the 58 kDa synthetic MaSP consensus 
polypeptide, described in Example 2, for instance, can 

35 be drawn by methods similar to those used for drawing 
low molecular weight nylons. . 

The invention being thus described, various 
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modifications of the materials and methods disclosed 
herein will be apparent to one of skill in the art. 
Such modifications are to be considered encompassed by 
the scope of the invention described by the claims 
5 below. Articles of the scientific and patent literature 
cited herein are incorporated by reference in their 
entirety by such citation. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Lewis, Randolph V. 

Colgin , Mark 

(ii) TITLE OF INVENTION: cDNAs Encoding Minor Ampullate Spider 
Silk Proteins 

(iii) NUMBER OF SEQUENCES: 56 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Birch, Stewart, Kolasch & Birch 

(B) STREET: P.O. Box 747 

(C) CITY: Falls Church 

(D) STATE: Virginia 

(E) COUNTRY: USA 

(F) ZIP: 22040-3487 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

<vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/209,747 

(B) FILING DATE: 14-MAR-1994 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Murphy Jr., Gerald M. 

(B) REGISTRATION NUMBER: 28,977 

(C) REFERENCE /DOCKET NUMBER: 1447-104P 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 703-205-8000 

(B) TELEFAX: 703-205-8050 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2793 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Nephila clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 183.. 2675 

(D) OTHER INFORMATION: /product= H N. clavipes minor 
ampullate silk protein" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
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ACATACTAGG 


TTTGGTGCCG 


GAGCTGGAGC 


TGGTACGTCT 


GTGCAGAAAT 


ACTTTGCACA 


60 


TCACTTCTCC 


AATTGCTTCT 


CGGGTATTTG 


TCAAATGATT 


AGTTCTACAA 


CTTCTACTGA 


120 


TCATGCAGTA AGTGTTGCTA 


CGAGCGTTGC 


GCTGAAGTCA 


GCTTGGACTT 


GATGCAAATG 


1B0 


CTATGAACAA 


CTTACTAGGT 


GCCGTTAGTG 


GATATGTTTC 


GACACTAGGC 


AACGCTATTT 


240 


CTGATG CTTC 


GGCATACGCA AATGCTCTTT 


CTTCCGCTAT 


AGGAAATGTG 


TTAGCTAATT 


300 


CCGGTTCAAT 




ACTGCATCTT 


CTGCTGCTTC 


CAGTGCTGCT 


TCTTCAGTCA 


360 


CTACAACTTT 


GACGTCTTAT 


GGACCAGCTG 


TATTTTACGC 


ACCTTCTGCA 


TCATCTGGAG 


420 


GCTATGGAG C 


TGGAG CTGGA 


GCTGTTGCTG 


CAGCAGGAGC 


TGCCGGCGCT 


GGAGGTTACG 


480 


GAAGAGGTGC 


TGGAGGCTAC 


GGTGGACAAG 


GAGGATATGG 


TGCCGGAGCC 


GGAGCTGGTG 


540 




TGCTGGAGCA 


GGAGCCGGAG 


GCGCTGGTGG 


TTACGGTAGA 


GGTGCTGGTG 


600 


v_ X X VJ\J 


TGCGGCTGCT 


GGGGCAGGTG 


CAGGCGCCGG 


TGGTGCTGGA 


TATGGTGGAC 


660 


7\ TV CCICTT* 71 T IV 


TGGTGCCGGA 


GCAGGAGCTG 


GTGCGGCTGC 


TGCTGCTGGT 


GCAGGAGCAG 


720 


GAGG i GL. 1 GG 


CGGTTACGGT 


AGAGGTGCTG 


GTGCTGGAGC 


AGGAGCCGCT 


GCGGGTGCTG 


780 


GAGL. TGGAGG 


CTACGGTGGT 


CAAGGTGGGT 


ACGGTGCCGG 


AGCAGGAGCT 


GGTGCGGCTG 


840 


G 1 GG IXaCTGC 


TGGAGCAGGA 


TCTGGAGGCG 


CTGGCGGTTA 


CGGTAGAGGT 


GCTGGTGCTG 


900 




CGCTGCAGGT 


GCAGGAGCAG 


GAGCTGGAAG 


CTACGGTGGT 


CAAGGATACG 


960 


ci'vrzr'Hnn'hnr' 

\3 A Vj V— Vw \J\3f\\j ^, 


AGGAGCTGGT 


GCTGCTGCAG 


CTGCANNNNN 


NNNNNNNNNN 


NNNNNNNNNN 


1020 


All ill A>( AH ISt AH AX AX AN 


NNNNNNNGGT 


GCAGGTGCAG 


GTGCTGGATA 


TGGTGGACAA 


GGCGGATATG 


1080 


GTG C CGGAG C 


AGGAGCTGGT 


GCGGCTGCTG 


CTGCTGGTGC 


AGGAGCTGGA 


GGTGCTGGTG 


1140 


GTTACGGTAG 


AGGTG CTGGT 


G CTGGAGCTG 


GAGCCGCTGC 


AGGTGCAGGA 


GCAGGAGCTG 


1200 


GAGGCTACGG 




GGATACGGTG 


CCGGAGCAGG 


AGCTGCTGCA 


GCTGCTGGAG 


1260 


CAGGAGCTGG 




uut i 1 AL.GG X G 


AGGTGCTGGT 


GCTGGAGCAG 


GAGCCGCTGC 


1320 


GGGTGCTGGA 






AGGAGCTGGA 


GGCTACGGTG 


GTCAAGGTGG 


1380 


GTACGGTGCC 


GGTGCAGGAG 


CTGGTGCGGC 


TGCTGCTGCT 


GGAGCAGGAG 


CTGGAGGCGC 


1440 


TGGTGGTTAC 


GGTAGAGGTG 


CTGGTGCTGG 


AGCTGGAGCT 


GCTGCAGGCG 


CAGGAGCTGG 


1500 


AGGCTACGGT 


GGTCAAGGTG 


GATACGGTGC 


CGGAGCAGGA 


GCTGGTGCTG 


CTGCAGCTGC 


1560 


TGCAACAGGA 


GCCGGAGGCG 


CTGGTGGTTA 


CGGTAGAGGT 


GCTGGTGCTG 


GAGCTGGTGC 


1620 


CGCTGCTGGG 


GCAGGTGCAG 


GCACCGGTGG 


TGCTGGATAT 


GGTGGACAAG 


GCGGTTATGG 


1680 


TGCCGGAGCA 


GGAGCTGGTG 


CGGCTGCTGC 


TGCTGGTGCA 


GGAGCAGGAG 


GTGCTGGTTA 


1740 


CGGTAGAGGT 


GCTGGTGCTG 


GAGCTGGAGC 


TGCTGCAGGT 


GCTGGAGCTG 


GAGCCGCTGC 


1800 


AGGTGCAGGA 


GCAGGAGCTG 


GAGGCTACGG 


TGGTCAGGGT 


GGATACGGTG 


CCGGAGCAAG 


1860 


AGCTGGTGCT 


GCGGCAGCTG 


CTGGAGCAGG 


AGCTGGAGGC 


GCTGCGGGTT ACAGTAGAGG 


1920 


TGGTCGTGCA 


GGAGCCGCTG 


GTGCTGGAGC 


TGGAGCCGCT 


GCAGGTGCAG 


GAGCAGGAGC 


1980 


TGGAGGCTAC 


GGTGGTCAAG 


GTGGATACGG 


TGCCGGAGCA 


GGAGCTGGTG 


CTGCTGCAGC 


2040 
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TGCTGGTGCA GGATCCGGAG GCGCTGGTGG TTACGGTAGA GGTGCTGGTG CTGGAGCCGC 2100 

TGCAGGAGCT GGAGCCGCTG CAGGTGCTGG AGCAGGAGCT GGAGGCTACG GTGGTCAAGG 2160 

TGGATACGGT GCCGGAGCAG GAGCTGCTGC AGCTGCTGGA GCAGGAGCCG GACGTGGAGG 2220 

TTACGGAAGA GGTGCTGGTG CTGGAGGCTA CGGTGGACAA GGAGGATATG GTGCCGGAGC 2280 

TGGAGCCGGT GCTGCTGCAG CTGCTGGAGC GGGAGCCGGA GGCTATGGCG ACAAGGAGAT 2340 

AGCCTGCTGG AGCAGGTGTA GATACACTGT TGCCTCCACA ACATCTCGTT TGAGTTCGGC 2400 

CGAAGCATCT TCTAGGATAT CGTCGGCGGC TTCCACTTTA GTATCTGGAG GTTACTTGAA 2460 

TACAGCAGCT CTGCCATCGG TTATTTCGGA TCTTTTTGCC CAAGTTGGTG CATCTTCTCC 2520 

GGTGATCAGA CAGCGAAGTT TGATCCAAGT TTTGTTGGAA ATTGTTTCTT CTCTTATCCA 2580 

TATTCTCAGT TCTTCTAGCG TAGGACAAGT CGATTTCAGT TCGGTTGGGT CGTCTGCTGC 2640 

AGCTGTTGGT CAATCCATGC AAGTTGTAAT GGGCTAAACA TGATGGTTCT CTCAATTATG 2700 

TATTCTTTAA TTACCGCTAA GGTAGCAAAA TATTGTAAAG TAAAGTTTTC TTACAAAATA 2760 

AAAATTCTTT TCTGCAAAAA AAAAAAAAAA AAA 2793 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..309 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Asn Asn Leu Leu Gly Ala Val Ser Gly Tyr Val Ser Thr Leu Gly 
1 5 10 15 

Asn Ala He Ser Asp Ala Ser Ala Tyr Ala Asn Ala Leu Ser Ser Ala 
20 25 30 

He Gly Asn Val Leu Ala Asn Ser Gly Ser He Ser Glu Ser Thr Ala 
35 40 45 

Ser Ser Ala Ala Ser Ser Ala Ala Ser Ser Val Thr Thr Thr Leu Thr 
50 55 60 

Ser Tyr Gly Pro Ala Val Phe Tyr Ala Pro Ser Ala Ser Ser Gly Gly 
65 70 75 80 

Tyr Gly Ala Gly Ala Gly Ala Val Ala Ala Ala Gly Ala Ala Gly Ala 
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95 



Gly Gly Tyr Gly Arg Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr 
100 105 * ' 110 

Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala 
115 120 125 

Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Gly Ala 
130 135 140 

Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly Ala Gly Tyr Gly Gly Gin 
145 150 155 160 

Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly 
165 170 175 

Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
180 185 190 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly 
195 200 205 

Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly 
210 215 220 

Ala Gly Ser Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
225 230 235 240 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Tyr Gly Gly 
245 250 255 

Gin Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Xaa 
260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gly Ala 
275 280 285 

Gly Ala Gly Ala Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala 
290 295 300 

Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Gly Ala Gly 
305 310 315 320 

Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala 
325 330 335 

Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Ser Gly Tyr Gly Ala Gly 
340 345 350 

Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Gly Ala Gly Gly 
355 360 365 

Tyr Gly Arg Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly 
370 375 380 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly 
385 390 395 400 

Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala 
405 410 415 

Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala 
420 425 430 

Gly Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly 
435 440 445 
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Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Ala Thr Gly 
450 455 460 

Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Gly 
465 470 475 480 

Ala Ala Ala Gly Ala Gly Ala Gly Thr Gly Gly Ala Gly Tyr Gly Gly 
485 490 495 

Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala 
500 505 510 

Gly Ala Gly Ala Gly Gly Ala Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
515 520 525 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly 
530 535 540 

Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala 
545 550 555 560 

Arg Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Gly Ala Ala 
565 570 575 

Gly Tyr Ser Arg Gly Gly Arg Ala Gly Ala Ala Gly Ala Gly Ala Gly 
580 585 590 

Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly 
595 600 605 

Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala 
610 615 620 

Gly Ser Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala 
625 630 635 640 

Ala Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly 
645 650 655 

Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Ala Ala Ala 
660 665 670 

Ala Gly Ala Gly Ala Gly Arg Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
675 680 685 

Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly 
690 695 700 

Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Asp Lys Glu 
705 710 715 " 720 

lie Ala Cys Trp Ser Arg Cys Arg Tyr Thr Val Ala Ser Thr Thr Ser 
725 730 735 

Arg Leu Ser Ser Ala Glu Ala Ser Ser Arg lie Ser Ser Ala Ala Ser 
740 745 ~ 750 

Thr Leu Val Ser Gly Gly Tyr Leu Asn Thr Ala Ala Leu Pro Ser Val 
755 760 765 

He Ser Asp Leu Phe Ala Gin Val Gly Ala Ser Ser Pro Val He Arg 
770 775 780 

Gin Arg Ser Leu He Gin Val Leu Leu Glu He Val Ser Ser Leu He 
785 790 795 800 

His He Leu Ser Ser Ser Ser Val Gly Gin Val Asp Phe Ser Ser Val 
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30 
810 



815 



Gly Ser Ser Ala Ala Ala Val Gly Gin Ser Met Gin Val Val Met Gly 
820 825 830 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

<ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..309 

(D) OTHER INFORMATION: /product = "amino terminus of MISP2 
protein" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TCT TAT GGA CCA TCC GTA TTT TAC ACT CCT ACT TCA GCT GGA AGC TAT 48 
Ser Tyr Gly Pro Ser Val Phe Tyr Thr Pro Thr Ser Ala Gly Ser Tyr 
15 io 15 

GGT GCA GGG GCC GGA GGT TTT GGA GCT GGA GCC TCT GCT GGT GTC GGA 96 
Gly Ala Gly Ala Gly Gly Phe Gly Ala Gly Ala Ser Ala Gly Val Gly 
20 25 30 

GCC GGA GCT GGT ACT GTA GCA GGA TAT GGT GGA CAA GGA GGA TAT GGT 144 
Ala Gly Ala Gly Thr Val Ala Gly Tyr Gly Gly Gin Gly Gly Tyr Gly 
35 40 45 

GCC GGA AGC GCT GGA GGT TAT GGA AGA GGT ACT GGA GCT GGA GCC GCT 192 
Ala Gly Ser Ala Gly Gly Tyr Gly Arg Gly Thr Gly Ala Gly Ala Ala 
50 55 60 

GCT GGT GCC GGA GCA GGA GCC ACT GCT GGT GCC GGA GCA GGA GCC GCT 240 
Ala Gly Ala Gly Ala Gly Ala Thr Ala Gly Ala Gly Ala Gly Ala Ala 
65 70 75 * 80 

GCT GGT GCC GGA GCA GGA GCA GGT AAT TCA GGA GGA TAT AGT GCC GGA 288 
Ala Gly Ala Gly Ala Gly Ala Gly Asn Ser Gly Gly Tyr Ser Ala Gly 
85 90 95 

GTA GGA GTT GGT GCT GCA GCT 309 
Val Gly Val Gly Ala Ala Ala 
100 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Ser Tyr Gly Pro Ser Val Phe Tyr Thr Pro Thr Ser Ala Gly Ser Tyr 
15 10 15 

Gly Ala Gly Ala Gly Gly Phe Gly Ala Gly Ala Ser Ala Gly Val Gly 
20 25 30 

Ala Gly Ala Gly Thr Val Ala Gly Tyr Gly Gly Gin Gly Gly Tyr Gly 
35 40 45 

Ala Gly Ser Ala Gly Gly Tyr Gly Arg Gly Thr Gly Ala Gly Ala Ala 
50 55 60 

Ala Gly Ala Gly Ala Gly Ala Thr Ala Gly Ala Gly Ala Gly Ala Ala 
65 70 75 ~ 80 

Ala Gly Ala Gly Ala Gly Ala Gly Asn Ser Gly Gly Tyr Ser Ala Gly 
85 90 95 

Val Gly Val Gly Ala Ala Ala 
100 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 164 

(D) OTHER INFORMATION: /product = "an internal portion of 
MISP2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CT GCA GOT GCT GGA GGA GGT GCC GGA ACT GTT GGA GGT TAC GGA AGA 47 
Ala Ala Ala Gly Gly Gly Ala Gly Thr Val Gly Gly Tyr Gly Arg 
1 5 10 15 

GGT GCT GGT GTA GGA GCA GGT GCC GCT GCT GGT TTT GCG GCA GGA GCT 95 
Gly Ala Gly Val Gly Ala Gly Ala Ala Ala Gly Phe Ala Ala Gly Ala 
20 25 30 

GGT GGT GCT GGA GGC TAC AGA AGA GAT GGA GGA TAC GGT GCT GGA GCA 143 
Gly Gly Ala Gly Gly Tyr Arg Arg Asp Gly Gly Tyr Gly Ala Gly Ala 
35 40 " 45 

GGA GCT GGA GCT GCT GCA GCT G 165 
Gly Ala Gly Ala Ala Ala Ala 
50 
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(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ala Ala Ala Gly Gly Gly Ala Gly Thr Val Gly Gly Tyr Gly Arg Gly 
15 10 15 

Ala Gly Val Gly Ala Gly Ala Ala Ala Gly Phe Ala Ala Gly Ala Gly 
20 25 30 

Gly Ala Gly Gly Tyr Arg Arg Asp Gly Gly Tyr Gly Ala Gly Ala Gly 
35 40 45 

Ala Gly Ala Ala Ala Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 870 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iiij HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

<F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1. .753 

(D) OTHER INFORMATION: /product = "MISP2 carboxy terminus" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GGT GCA GGA GGC TAT GGA AGA GGT GCT GGA GCT GGA GCT GCT GCA GTC 48 
Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Ala Ala Val 
1 5 10 15 

GCA GGT GCA GAT GCT GGT GGC TAT GGA AGA AAT TAT GGT GCT GGA ACC 96 
Ala Gly Ala Asp Ala Gly Gly Tyr Gly Arg Asn Tyr Gly Ala Gly Thr 
20 25 30 

ACT GCT TAT GCA GGA GCC AGA GCC GGT GGT GCT GGA GGC TAT GGC GGA 144 
Thr Ala Tyr Ala Gly Ala Arg Ala Gly Gly Ala Gly Gly Tyr Gly Gly 
35 40 45 

CAA GGA GGA TAT TCT TCT GGA GCC GGT GCT GCT GCA GCT TCT GGA GCA 192 
Gin Gly Gly Tyr Ser Ser Gly Ala Gly Ala Ala Ala Ala Ser Gly Ala 
50 55 60 

GGA GCC GAT ATC ACT AGT GGA TAC GGA AGA GGT GTT GGT GCT GGA GCT 240 
Gly Ala Asp He Thr Ser Gly Tyr Gly Arg Gly Val Gly Ala Gly Ala 
65 70 75 80 
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GGA GCA GAA ACT ATA GGT GCT GGA GGC TAT GGA GGT GGG GCT GGA TCA 286 
Gly Ala Glu Thr lie Gly Ala Gly Gly Tyr Gly Gly Gly Ala Gly Ser 
85 90 95 

GGA GCA CGT GCG GCT TCA GCA TCC GGA GCT GGT ACT GGA TAT GGT TCG 336 
Gly Ala Arg Ala Ala Ser Ala Ser Gly Ala Gly Thr Gly Tyr Gly Ser 
100 105 " HO 

TCT GGA GGT TAT AAC GTA GGT ACC GGA ATA AGT ACT TCT TCT GGC GCT 384 
Ser Gly Gly Tyr Asn Val Gly Thr Gly lie Ser Thr Ser Ser Gly Ala 
115 120 125 

GCA TCT AGC TAC TCT GTT TCT GCT GGA GGT TAT GCT TCA ACA GGT GTT 432 
Ala Ser Ser Tyr Ser Val Ser Ala Gly Gly Tyr Ala Ser Thr Gly Val 
130 135 140 

GGT ATT GGA TCC ACT GTT ACA TCC ACA ACA TCT CGT TTG AGT TCT GCT 480 
Gly lie Gly Ser Thr Val Thr Ser Thr Thr Ser Arg Leu Ser Ser Ala 
145 150 155 160 

GAA GCA TGT TCT AGA ATA TCT GCT GCG GCT TCC ACT TTA GTA TCT GGA 528 
Glu Ala Cys Ser Arg He Ser Ala Ala Ala Ser Thr Leu Val Ser Gly 
165 170 175 

TCC TTG AAT ACT GCA GCT TTA CCA TCT GTA ATT TCG GAT CTT TTT GCC 576 
Ser Leu Asn Thr Ala Ala Leu Pro Ser Val He Ser Asp Leu Phe Ala 
180 185 190 

CAA GTT AGT GCA TCA TCA CCC GGG GTA TCA GGT AAC GAA GTT TTG ATT 624 
Gin Val Ser Ala Ser Ser Pro Gly Val Ser Gly Asn Glu Val Leu He 
195 200 205 

CAA GTT TTG TTG GAA ATT GTT TCT TCT CTT ATC CAT ATT CTT AGT TCT 672 
Gin Val Leu Leu Glu He Val Ser Ser Leu He His He Leu Ser Ser 
210 215 220 

TCT AGT GTA GGG CAA GTA GAT TTC AGT TCT GTT GGT TCA TCT GCT GCA 720 
Ser Ser Val Gly Gin Val Asp Phe Ser Ser Val Gly Ser Ser Ala Ala 
225 230 235 240 

GCC GTT GGT CAA TCC ATG CAA GTT GTA ATG GGT TAAAACAAAA TGGCTCTCTC 773 
Ala Val Gly Gin Ser Met Gin Val Val Met Gly 
245 250 

TCTGTTATAT GCATTCTGTA ATTTCTTCTA AACTATTAAA ATAATGTAAT AATTTCCTGC 833 

ATAAATAAAA ATATTTTTCT GCAAAAAAAA AAAAAAA 870 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 251 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Ala Ala Val 
1 5 io 15 

Ala Gly Ala Asp Ala Gly Gly Tyr Gly Arg Asn Tyr Gly Ala Gly Thr 
20 25 ' 30 

Thr Ala Tyr Ala Gly Ala Arg Ala Gly Gly Ala Gly Gly Tyr Gly Gly 
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35 40 45 

Gin Gly Gly Tyr Ser Ser Gly Ala Gly Ala Ala Ala Ala Ser Gly Ala 
50 55 60 

Gly Ala Asp He Thr Ser Gly Tyr Gly Arg Gly Val Gly Ala Gly Ala 
65 70 75 80 

Gly Ala Glu Thr He Gly Ala Gly Gly Tyr Gly Gly Gly Ala Gly Ser 
85 90 95 

Gly Ala Arg Ala Ala Ser Ala Ser Gly Ala Gly Thr Gly Tyr Gly Ser 
100 105 110 

Ser Gly Gly Tyr Asn Val Gly Thr Gly He Ser Thr Ser Ser Gly Ala 
115 120 125 

Ala Ser Ser Tyr Ser Val Ser Ala Gly Gly Tyr Ala Ser Thr Gly Val 
130 135 140 

Gly He Gly Ser Thr Val Thr Ser Thr Thr Ser Arg Leu Ser Ser Ala 
145 150 155 160 

Glu Ala Cys Ser Arg He Ser Ala Ala Ala Ser Thr Leu Val Ser Gly 
165 170 175 

Ser Leu Asn Thr Ala Ala Leu Pro Ser Val He Ser Asp Leu Phe Ala 
180 185 190 

Gin Val Ser Ala Ser Ser Pro Gly Val Ser Gly Asn Glu Val Leu He 
195 200 205 

Gin Val Leu Leu Glu He Val Ser Ser Leu He His He Leu Ser Ser 
210 215 220 

Ser Ser Val Gly Gin Val Asp Phe Ser Ser Val Gly Ser Ser Ala Ala 
225 230 235 240 

Ala Val Gly Gin Ser Met Gin Val Val Met Gly 
245 250 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .165 

(D) OTHER INFORMATION: /label= cloned_cDNA 

/note= "pMISS3 partial sequence, 11-1 template, 
forward primer" ' 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..165 
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(D) OTHER INFORMATION: /product = "translation of pMISS3 
partial sequence" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GCT GGA GCT GCT GCT GGT GCT GGA GGC TAT GAC GGA CAA GGA GGA TAT 48 
Ala Gly Ala Ala Ala Gly Ala Gly Gly Tyr Asp Gly Gin Gly Gly Tyr 
15 10 15 

GGT GCT GGA GCA GGA GCT GCT GCA GCT GCT GGA GCA GGA GCC GGA AGC 96 
Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Ser 
20 25 30 

GTT GGA GGT TAT GGA ACA GGT GCT GTA GCT GGA TCT GGA ACA GCT GCT 144 
Val Gly Gly Tyr Gly Thr Gly Ala Val Ala Gly Ser Gly Thr Ala Ala 
35 40 45 

GGT GCA GGA GCC AGA GCT GGT 165 
Gly Ala Gly Ala Arg Ala Gly 
50 55 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Ala Gly Ala Ala Ala Gly Ala Gly Gly Tyr Asp Gly Gin Gly Gly Tyr 
15 10 15 

Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala Gly Ser 
20 25 30 

Val Gly Gly Tyr Gly Thr Gly Ala Val Ala Gly Ser Gly Thr Ala Ala 
35 40 45 

Gly Ala Gly Ala Arg Ala Gly 
50 55 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..240 

(D) OTHER INFORMATION: /label= cloned_cDNA 

/note= "partial sequence of pMISS3, 11- l template, 
reverse primer" 



WO 95/25165 



PCT/US95/03139 



36 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..240 

(D) OTHER INFORMATION: /product= "pMISS3 partial sequence 
translation" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GGA GCT GCT GCT GGT GCA GGA GCC GGA GCA GGT AGT ACA GGA GGC TTT 48 
Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Thr Gly Gly Phe 
15 10 15 

GGC GGA CAA GGA GGA TAT GGT GCC GGT GCA GGA GCT GCA GCT GCT GGA 96 
Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Ala Ala Ala Gly 
20 25 30 

GCT TTT GCC GGA AGA GCT GGG GGT TAC GGA AGA GCT GCT GGA GCT GCG 144 
Ala Phe Ala Gly Arg Ala Gly Gly Tyr Gly Arg Ala Ala Gly Ala Ala 
35 40 45 

GCT GGA ACT GGA GCT GCT GCT GGT GCA GGA GCC GGA GCT GGT AGT ACA 192 
Ala Gly Thr Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Thr 
50 55 ' 60 

GGA GGC TTT GGC GGA CAA AGA GGA TAC GGT GCC GGC AGA AGT AAT GGA 240 
Gly Gly Phe Gly Gly Gin Arg Gly Tyr Gly Ala Gly Arg Ser Asn Gly 
65 70 75 ~ 80 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Thr Gly Gly Phe 
1 5 io 15 

Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Ala Ala Ala Gly 
20 25 30 

Ala Phe Ala Gly Arg Ala Gly Gly Tyr Gly Arg Ala Ala Gly Ala Ala 
35 40 45 

Ala Gly Thr Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Thr 
50 55 ' 60 

Gly Gly Phe Gly Gly Gin Arg Gly Tyr Gly Ala Gly Arg Ser Asn Gly 
65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: N. clavipes 

(F) TISSUE TYPE: minor ampullate gland 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..144 

(D) OTHER INFORMATION: /label= cloned_cDNA 

/note= "partial sequence of pMISS3 # 11-2 template, 
forward primer" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..144 

(D) OTHER INFORMATION: /product= "translation of pMISS3 
partial sequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TAT GGT GGA CAA GGC GGA TAT GGT GCT GGA GCA GGA GCT GGT GCT GCT 48 
Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala 
15 10 15 

GCA GCC GCA GGA TAT GGA GCC GGT GCT GGA GGA TAC GGT GGA CAA GCT 96 
Ala Ala Ala Gly Tyr Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Ala 
20 25 30 

GGT TAT GGT GCC GGA GCT GGA GCT GGT AGT TCT GCA GGA AAT GCT TTC 144 
Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ser Ser Ala Gly Asn Ala Phe 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala 
1 5 10 15 

Ala Ala Ala Gly Tyr Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Ala 
20 25 30 

Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ser Ser Ala Gly Asn Ala Phe 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 155 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(v) FRAGMENT TYPE: N- terminal 
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(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..155 

(D) OTHER INFORMATION: /label= MISPN_aa 

/note= "amino- terminal sequence of mispl, see Fig. 
4" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Asn Asn Leu Leu Phe Ala Val Ser Gly Tyr Val Ser Thr Leu Gly 
1 5 10 15 

Asn Ala lie Ser Asp Ala Ser Ala Tyr Ala Asn Ala Leu Ser Ser Ala 
20 25 30 

He Gly Asn Val Leu Ala Asn Ser Gly Ser He Ser Glu Ser Thr Ala 
35 40 45 

Ser Ser Ala Ala Ser Ser Ala Ala Ser Ser Val Thr Thr Thr Leu Thr 
50 55 60 

Ser Tyr Gly Pro Ala Val Phe Tyr Ala Pro Ser Ala Ser Ser Gly Gly 
65 70 75 80 

Tyr Gly Ala Gly Ala Gly Ala Val Ala Ala Ala Gly Ala Ala Gly Ala 
85 90 95 

Gly Gly Tyr Gly Arg Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr 
100 105 * 110 

Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala 
115 120 125 

Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly Ala Gly Ala 
130 135 140 

Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly Ala 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 90 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: N- terminal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 90 

(D) OTHER INFORMATION: /labels MISP2N AA 

/note= "amino terminal peptide~of MISP2 , see Fig. 
4" 

(xi) SEQUENCE. DESCRIPTION: SEQ ID NO: 16: 

Ser Tyr Gly Pro Ser Val Phe Tyr Thr Pro Thr Ser Ala Gly Ser Tyr 
15 10 15 

Gly Ala Gly Ala Gly Ala Phe Gly Ala Gly Ala Ser Ala Gly Val Gly 
20 25 30 
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Ala Gly Ala Gly Thr Val Ala Gly Tyr Gly Gly Gin Gly Gly Tyr Gly 
35 40 45 

Ala Gly Ala Gly Ser Ala Gly Gly Tyr Gly Arg Gly Thr Gly Ala Gly 
50 55 60 

Ala Ala Ala Gly Ala Gly Ala Gly Ala Thr Ala Gly Ala Gly Ala Gly 
65 70 75 80 

Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly 
85 90 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: C- terminal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..115 

(D) OTHER INFORMATION: /label= MISP1C_AA 

/note= "carboxyl terminus of MISP1, see Fig. 4" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Asp Lys Glu He Ala Cys Trp Ser Arg Cys Arg Tyr Thr Val Ala Ser 
15 10 15 

Thr Thr Ser Arg Leu Ser Ser Ala Glu Ala Ser Ser Arg He Ser Ser 
20 25 30 

Ala Ala Ser Thr Leu Val Ser Gly Gly Tyr Leu Asn Thr Ala Ala Leu 
35 40 . 45 

Pro Ser Val He Ser Asp Leu Phe Ala Gin Val Gly Ala Ser Ser Pro 
50 55 60 

Val He Arg Gin Arg Ser Leu He Gin Val Leu Leu Glu lie Val Ser 
65 70 75 80 

Ser Leu He His lie Leu Ser Ser Ser Ser Val Gly Trp Val Asp Phe 
85 90 " 95 

Ser Ser Val Gly Ser Ser Ala Ala Ala Val Gly Gin Ser Met Gin Val 
100 105 no 

Val Met Gly 
115 

(2) INFORMATION FOR SEQ ID NO: 18: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: C- terminal 
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(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1 . . 116 

(D) OTHER INFORMATION: /label= MISP2C_AA 

/note= "carboxyl terminus of MISP2 , see Fig. 4" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Gly Gly Tyr Ala Ser Thr Gly Val Gly lie Gly Ser Thr Val Thr Ser 
15 10 15 

Thr Thr Ser Arg Leu Ser Ser Ala Glu Ala Cys Ser Arg He Ser Ala 
20 25 30 

Ala Ala Ser Thr Leu Val Ser Gly Gly Ser Leu Asn Thr Ala Ala Leu 
35 40 45 

Pro Ser Val He Ser Asp Leu Phe Ala Gin Val Ser Ala Ser Ser Pro 
50 55 60 

Gly Val Ser Gly Asn Glu Val Leu He Gin Val Leu Leu Glu He Val 
65 70 75 80 

Ser Ser Leu He His He Leu Ser Ser Ser Ser Val Gly Gin Val Asp 
85 90 95 

Phe Ser Ser Val Gly Ser Ser Ala Ala Ala Val Gly Gin Ser Met Gin 
100 105 110 

Val Val Met Gly 
115 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: N-terminal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..33 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Gly Ala Ala Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Gly Tyr Gly 
15 10 15 

Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala 
20 25 30 

Ala 



(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..51 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
1 5 10 * 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly Ala Gly 
20 25 30 

Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala 
35 40 45 

Ala Ala Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..48 

(D) OTHER INFORMATION: /label = mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
1 5 10 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin 
20 25 30 

Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Ala 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..49 

<D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Gly Ala Gly Ser Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
15 10 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Ser Tyr Gly 
20 25 30 

Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala 
35 40 45 

Ala 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .39 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
15 10 15 

Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Arg Ala Gly Ala Gly Ala 
20 25 30 

Gly Gly Ala Ala Ala Ala Ala 
35 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 
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(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: /label= tnispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
15 10 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Gly Gly Tyr Gly 
20 25 30 

Gly Gin Ser Gly Tyr Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1..55 

(D) OTHER INFORMATION: /label- mispl_repeat 
/note= "see Table 1 M 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
15 10 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala 
20 25 30 

Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly 
35 40 45 

Ala Gly Ala Ala Ala Ala Ala 
50 55 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME /KEY : Peptide 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: /label= mispl_repeat 
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/note= "see Table l !t 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Gly Ala Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
15 10 15 

Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin 
20 25 30 

Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1..50 

(D) OTHER INFORMATION: /label = mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:27: 

Thr Gly Ala Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
15 10 15 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Thr Gly Gly Ala Gly Tyr 
20 25 30 

Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala 
35 40 45 

Ala Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 56 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
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Gly Ala Gly Ala Gly Gly Ala Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
15 10 15 

Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly 
20 25 30 

Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala 
35 40 45 

Arg Ala Gly Ala Ala Ala Ala Ala 
50 55 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 54 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table l n 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Gly Ala Gly Ala Gly Gly Ala Ala Gly Tyr Ser Arg Gly Gly Arg Ala 
1 5 10 15 

Gly Ala Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly 
20 25 30 

Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala 
35 40 45 

Gly Ala Ala Ala Ala Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..51 

(D) OTHER INFORMATION: /labels mispl_repeat 
/note= "see Table l" 



(Xi) SEQUENCE DESCRIPTION: SEQIDNO:30: 

Gly Ala Gly Ser Gly Gly Ala Gly Gly Tyr Gly Arg Gly Ala Gly Ala 
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15 10 15 

Gly Ala Ala Ala Gly Ala Gly Ala Ala Ala Gly Ala Gly Ala Gly Ala 
20 25 30 

Gly Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Ala 
35 40 45 

Ala Ala Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: C- terminal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..36 

(D) OTHER INFORMATION: /label= mispl_repeat 
/note= "see Table 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Gly Ala Gly Ala Gly Arg Gly Gly Tyr Gly Arg Gly Ala Gly Ala Gly 
15 10 15 

Gly Tyr Gly Gly Gin Gly Gly Tyr Gly Ala Gly Ala Gly Ala Gly Ala 
20 25 30 

Ala Ala Ala Ala 
35 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..55 

(D) OTHER INFORMATION: /label= mispl_repeat 

/note= "consensus sequence of MiSPl repeats" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Arg Gly Ala Ala Gly Ala Ala Gly Ala Gly Ala Gly Ala Ala Ala Gly 
1 5 10 15 

Ala Gly Ala Gly Ala Gly Ala Gly Gly Tyr Gly Gly Gin Gly Gly Tyr 
20 25 30 
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Gly Ala Gly Ala Gly Ala Gly Ala Ala Ala Ala Ala Gly Ala Gly Ala 
35 40 45 

Gly Gly Ala Gly Gly Tyr Gly 
50 55 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .11 

(D) OTHER INFORMATION: /label* mispl generic 
/note= "generic formula for MlSPl" 

( ix) FEATURE : 

(A) NAME/KEY: Duplication 

(B) LOCATION: 3.. 4 

<D) OTHER INFORMATION: /label= GA 

/note= " (GA) repeated 1 to 6 times" 

(ix) FEATURE: 

(A) NAME /KEY: Duplication 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /label= A 

/note= "present as 0 to 4 residues" 

(ix) FEATURE: 

(A) NAME /KEY: Duplication 

(B) LOCATION: 6.. 8 

(D) OTHER INFORMATION: /label- GGX 

/note« "X is tyrosine, glutamine or alanine; unit 
is repeated 1 to 4 times." 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /label= GA 

/note= "repeated 1 to 6 times" 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /label= A 

/note= "present as 0 to 4 residues" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Gly Arg Gly Ala Ala Gly Gly Xaa Gly Ala Ala 
15 10 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .15 

(D) OTHER INFORMATION: /label= MaSPl_generic 

/note= "generic formula for MaSPl protein (major 
ampullate spider silk protein) . " 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 1. .3 

(D) OTHER INFORMATION: /label= XGG 

/note= "X is tyrosine or glutamine; unit is 
repeated 2 to 3 times" 

(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 4. .6 

(D) OTHER INFORMATION: /labels XGA 

/note= "X is tyrosine or glutamine; unit is 
present once . " 

(ix) FEATURE : 

(A) NAME/KEY: Duplication 

(B) LOCATION: 7.. 9 

(D) OTHER INFORMATION: /label= GXG 

/note= "X is tyrosine or glutamine; unit is 
repeated 1 to three times . " 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 10.. 12 

(D) OTHER INFORMATION: /label- AGA 

/note= "unit is repeated 5 to 7 times" 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 13 

(D) OTHER INFORMATION: /label= G 

/note= "present as 1 or 2 residues" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Xaa Gly Gly Xaa Gly Ala Gly Xaa Gly Ala Gly Ala Gly Ala Gly 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .14 

(D) OTHER INFORMATION: /label= MaSP2_generic 
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/note= "generic formula for MaSP2 protein {major 
ampullate spider silk protein) . " 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 1..10 

(D) OTHER INFORMATION: /label= GPG2YGPGQ2 

/note= "unit is repeated 2 or 3 times" 

(ix) FEATURE: 

(A) " NAME /KEY: Duplication 

(B) LOCATION: 11.. 12 

(D) OTHER INFORMATION: /label= XX 
/note= "X is GPG or GPS" 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 14 

(D) OTHER INFORMATION: /label= A 

/note= "present as 7 to 10 residues" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Xaa Xaa Ser Ala 
15 io 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..6 

(D) OTHER INFORMATION: /label= MiSP_simple 

/note= "simplified MiSPl generic formula; x is 
tyrosine, glut amine or alan. . . " 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 1..3 

(D) OTHER INFORMATION: /labels GGX 

/note= "X is tyrosine, glutamine or alanine; unit 
is repeated 1 to 4 times." 

(ix) FEATURE: 

(A) NAME/KEY: Duplication 

(B) LOCATION: 4.. 5 

(D) OTHER INFORMATION: /label= GA 

/note= "unit is present 0 to 4 times" 

(ix) FEATURE: 

(A) NAME /KEY : Duplication 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /label= A 

/note= "present as 1 to 6 residues" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
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Gly Gly Xaa Gly Ala Ala 
1 5 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: N- terminal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala 
20 25 30 

Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..38 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

Gly Pro Gly Gin Gin Gly Pro Gly Arg Tyr Gly Pro Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Gly Ser Gly Gin 
20 25 30 

Gin Gly Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 52 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Gly Pro Arg Gin Gin Gly Pro Gly Gly Tyr Gly Gin Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ser Ala Ala Ala Ser Ala 
20 25 30 

Glu Ser Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
35 40 45 

Pro Gly Gly Tyr 
50 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /labels MaSP2_repeat 
/note* "see Table 2" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ser Gly Pro 
20 25 30 

Gly Gin Gin Gly Pro Gly Gly Tyr 
35 40 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..41 

(D) OTHER INFORMATION: /labels MaSP2_repeat 
/note= "see Table 2" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ser Gly 
20 25 30 

Pro Gly Gin Gin Gly Pro Gly Gly Tyr 
35 40 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..29 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 " * 15 

Leu Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala 
20 25 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..36 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 * 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Gly 
20 25 30 

Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..39 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 15 

Pro Ser Gly Ala Gly Ser Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly 
20 25 30 

Gin Gin Gly Leu Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..32 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2 M 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 15 

Pro Gly Gly Tyr Gly Pro Gly Ser Ala Ser Ala Ala Ala Ala Ala Ala 
20 25 30 
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(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..37 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2 M 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ser Ala Ala Ala Ala Ala Ala Ala Ala 
20 25 30 

Gly Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1..37 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Ala Pro Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ser Ala Ala Ala Ala Ala Ala Ala Ala 
20 25 30 

Gly Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO:48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(v) FRAGMENT TYPE: C- terminal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..36 

(D) OTHER INFORMATION: /label= MaSP2_repeat 
/note= "see Table 2" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Ala Pro Gly Gin Gin Gly 
15 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ser Ala Gly 
20 25 30 

Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..37 

(D) OTHER INFORMATION: /label= MaSP2_consensus 

/note« "consensus sequence of MaSP2 repeat units" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly 
1 5 10 15 

Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala 
20 25 30 

Gly Pro Gly Gly Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 84 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(ix) FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION: 1..84 

(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "S2 long oligo" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TCTAGCCCGG GTGGCTATGG TCCTGGACAG CAAGGTCCTG GCGGTTACGG TCCTGGCCAA 60 
CAGGGTCCCT CTGGTCCAGG CAGT 84 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..59 

(D) OTHER INFORMATION: /label* oligonucleotide 
/note= M S2 short oligo" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TCCGGACCTG CTGCGGCGGC TGCGGCAGCT GCACTGCCTG GACCAGAGGG ACCCTGTTG 59 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide. 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1. .35 

(D) OTHER INFORMATION: /label= MaSP2_repeat 

/note= "basic repeat unit of MaSP2 protein" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Pro Gly Gly Tyr Gly Pro Gly Gin Gin Gly Pro Gly Gly Tyr Gly Pro 
1 5 10 15 

Gly Gin Gin Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala 
20 25 30 

Ala Ala Gly 
35 

(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..13 

(D) OTHER INFORMATION: /labels enzyme_site 

/note= "generic recognition site for Sfi I 
restriction enzyme" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GGCCNNNNNG GCC 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1 . . 13 

(D) OTHER INFORMATION: /label= Sfijr_site 

/note= "top strand of synthetic Sfi I/AlwN I 
linker" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GGCCGCAGCG GCC 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..4 

(D) OTHER INFORMATION: /label= linker_j>eptide 

/note= "amino acids encoded by Sfi I/AlwN I 
linker" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
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Ala Ala Ala Ala 
1 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..6 

(D) OTHER INFORMATION: /label= NBSjpeptides 
/note* "see discussion page 13" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 

Gly Gly Gin Gly Gly Tyr 
1 5 
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CLAIMS 

What is claimed is: 

1. A polypeptide having an amino acid sequence 
comprising repeats of the unit amino acid sequence 

5 RGAAGAA6AGAGAAAGAGAGAGAGGYGGQGGYGAGAGAGAAAAAGAGAGGAGGYG - 

2. A polypeptide having the amino acid sequence 
shown in Figure l. 

3. A polypeptide comprising one or more repeats of 
a unit amino acid repeat sequence selected from the 

10 group consisting of GAAGAGGYGRGAGGYGGQGG YGAGAGAGAAAAA , 
GAGAGGAGGYGRGAGAGAGAAAGAGAGAGGAGYGGQGG YGAGAGAGAAAAA , 
GAGAGGAGG YGRGAGAGAGAAAGAGAGGYGGQGGYGAGAGAGAAAAAA , 
GAGSGGAGGYGRGAGAGAGAAAGAGAGAGS YGGQGGYGAGAGAGAAAAA , 
GAGAGGAGG YGRGAGAGAGAGAGAAARAGAGAGGAAAAA , 

15 GAGAGGAGG YGRGAGAGAGAGAGAAARAGAGAGG , 

GAGAGGAGGYGRGAGAGAGAAAGAGAGAGGYGGQSGYGAGAGAAAAA , 
GAGAGGAGG YGRGAGAGAGAAAGAGAGAAAGAGAGGYGGQGGYGAGAGAGAAAAA , 
GAGAGGAGGYGRGAGAGAGAAAGAGAGGYGGQGGYGAGAGAGAAAAA , 
TGAGGAGGYGRGAGAGAGAAAGAGAGTGGAGYGGQGGYGAGAGAGAAAAA , 

2 0 GAGAGGAGYGRGAGAGAGAAAGAGAGAAAGAGAGAGGYGGQG 
G YGAGARAGAAAAA , 
GAGAGGAAGYSRGGRAGAAGAGAGAAAGAGAGAGG YGGQGGYGAGAGAGAAAAA , 
GAGSGGAGGYGRGAGAGAAAGAGAAAGAGAGAGGYGGQGGYGAGAGAAAAA , and 
GAGAGRGG YGRGAGAGGYGGQGGYGAGAGAGAAAAA . 

25 4. A polypeptide according to claim 3, wherein all 

repeats are of the same unit amino acid repeat 
sequence . 
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5. A polypeptide according to claim 3 further 
comprising an amino terminal polypeptide having the 
amino acid sequence 

NNLLFAVSGYVSTLGNAI SDASAYANAL 
5 SSAIGNVLANSGSISESTASSAASSAAS 
S V T T T 

LTSYGPAVFYAPSASSGGYGAGAGAVAA 
AGAAGAGGYGRGAGGYGGQGGYGAGAGA 
GAAAAAGAGAGGAGGYGRGAGAGAGAAA 
10 GAGAGAGGA. 

6. A polypeptide according to claim 3 further 
comprising an amino terminal polypeptide having the 
amino acid sequence 

SYGPSVFYTPTSAGSYGAGAGAFGAGAS 
15 AGVGAGAGTVAGYGGQGGYGAGAGSAGG 
YGRGTGAGAAAGAGAGATAGAGAGAAAG 
A G A G A G. 

7. A polypeptide according to claim 3 further 
comprising a carboxy terminal polypeptide having the 
amino acid sequence 

DKE IACWSRCRYTVASTTSRLS SAEAS S 
R I SSAASTLVSGGYLNTAALPSVI SDLF 
A Q V G A S 

SPVIRQRSLIQVLLEIVSSLIHILSSSS 
VGWVDFSSVGSSAAAVGQSMQVVMG. 

8. A polypeptide according to claim 3 further 
comprising a carboxy terminal polypeptide having the 
amino acid sequence 

GGYASTGVGIGSTVTSTTSRLSSAEACS 
30 R ISAAASTLVSGGSLNTAALPSVI SDLF 
AQVSAS S PGVSGNEVL.I QVLLE IVSSLI 

HILSSSSVGQVDFSSVGSSAAAVGQSMQ 
V V M G. 



20 



25 
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9. A polypeptide according to claim 5 further 
comprising a carboxy terminal polypeptide having the 
amino acid sequence 

DKEIACWSRCRYTVASTTSRLSSAEASS 
5 RISSAASTLVSGGYLNTAALPSVISDLF 
A Q V G A S 

SPVIRQRSLIQVLLEIVSSLIHILSSSS 
VGWVDFSSVGSSAAAVGQSMQVVMG. 

10. A polypeptide according to claim 5 further 
10 comprising a carboxy terminal polypeptide having the 

amino acid sequence 

GGYASTGVGI GSTVTSTTSRLSSAEACS 
RISAAASTLVSGGSLNTAALPSVI SDLF 
AQVSASSPGVSGNEVLIQVLLEIVSSLI 
15 HILSSSSVGQVDFSSVGSSAAAVGQSMQ 
V V M G. 

11. An isolated DNA molecule encoding a polypeptide 
of any one of claims 1, 3 and 10. 

12. An isolated DNA molecule having the nucleotide 
20 sequence of Figure 1. 

13. A fiber comprising an aggregate of polypeptides 
according to any one of claims 1 through 10. 

14 . A fiber comprising an aggregate of polypeptides 
according to claim 7 and polypeptides according to claim 

25 8. 

15. A fiber comprising an aggregate of polypeptides 
according to claim 9 and polypeptides according to claim 
10. 
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16 . A host cell transformed with a DNA according 
to any one of claim 11. 

17. A host cell transformed with a DNA according 
to claim 12. 

5 18. A polypeptide comprising repeats of an amino 

acid sequence having the generic formula 

(GR)(GA),(A) m (GGX) n (GA),(A) m 

where X is tyrosine, glutamine or alanine and 
where 1=1 to 6, m=0to4 and n = 1 to 4. 

10 19. An isolated DNA molecule encoding a polypeptide 

of claim 18. 

20. A host cell transformed with a DNA molecule 
according to claim 19. 

21. A polypeptide comprising repeats of an amino 
15 acid sequence having the generic formula: 

(GGX) n (GA) m (A), 

where X is tyrosine, glutamine or alanine and 
where 1=1 to 6, m=0 to 4 and n = 1 to 4 . 

22 . An isolated DNA molecule encoding a polypeptide 
20 of claim 21. 

23. A host cell transformed with a DNA molecule 
according to claim 22. 

24 . An isolated DNA molecule comprising a 
polynucleotide that will hybridize to a DNA molecule 

25 having the sequence of Figure 1A-1F under conditions 
obtained by a solution of 6X SSC or SSPE, 5X Denhardt's 
solution, 0.5% SDS at a temperature of about 68°C, or 
under conditions obtained by the said solution that is 
made 50% in formamide at a temperature- of about 42°C. 
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10 20 30 40 50 

********** 

ACATA CTAGG TTTGG TGCCG GAGCT GGAGC TGGTA CGTCT GTGCA GAAAT 

60 70 80 90 100 

********** 

ACTTT GCACA TCACT TCTCC AATTG CTTCT CGGGT ATTTG TCAAA TGATT 

110 120 130 140 150 

***** ***** 

AGTTC TACAA CTTCT ACTGA TCATG CAGTA AGTGT TGCTA CGAGC GTTGC 

160 170 180 190 

****** * * * 

GCTGA AGTCA GCTTG GACTT GATGC AAATG CT ATG AAC AAC TTA CTA 

M N N L L> 



200 210 220 230 240 

********* 

GGT GCC GTT AGT GGA TAT GTT TCG ACA CTA GGC AAC GCT ATT TCT 
GAVSGYVSTLGNAIS> 



250 260 270 280 

***** **** 

GAT GCT TCG GCA TAC GCA AAT GCT CTT TCT TCC GCT ATA GGA AAT 
DASAYANALSSAIG N> 



290 300 310 320 330 

********* 

GTG TTA GCT AAT TCC GGT TCA ATT AGC GAA AGC ACT GCA TCT TCT 
V LAN S G S I S E S TAS S> 



340 350 360 370 

********* 

GCT GCT TCC AGT GCT GCT TCT TCA GTC ACT ACA ACT TTG ACG TCT 
AASSAASSVTTTLTS> 

380 390 400 410 420 

********* 

TAT GGA CCA GCT GTA TTT TAC GCA CCT TCT GCA TCA TCT GGA GGC 
YGPAVFYAPSASSGG> 



430 440 450 460 

********* 

TAT GGA GCT GGA GCT GGA GCT GTT GCT GCA GCA GGA GCT GCC GGC 
YGAGAG AVAAAGAAG> 



470 480 
* * * 

GCT GGA GGT TAC GGA 
A G G Y G 



490 

* * * 

AGA GGT GCT GGA GGC 
R G A G G 



500 510 
* * * 

TAC GGT GGA CAA GGA 
Y G G Q G> 



FIGo 1A 
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520 530 

* * * 

GGA TAT GGT GCC GGA 
G Y G A G 

560 570 

* * * 

GCA GGA GCC GGA GGC 
A G A G G 



540 

* * * 
GCC GGA GCT GGT GCT 

A G A G A 

580 

* * * 
GCT GGT GGT TAC GGT 

A G G Y G 



550 

* * * 
GCT GCA GCT GCT GGA 

A A A A G> 

590 600 

* * * 

AGA GGT GCT GGT GCT 
R G A G A> 



610 620 630 640 

********* 

GGA GCT GGT GCG GCT GCT GGG GCA GGT GCA GGC GCC GGT GGT GCT 
GAGAAAGAGAGAGGA> 

650 660 670 680 690 

********* 

GGA TAT GGT GGA CAA GGC GGA TAT GGT GCC GGA GCA GGA GCT GGT 
GYGGQGGYGAGAGAG> 

700 710 720 _ 730 

*.* ** * ** * * 

GCG GCT GCT GCT GCT GGT GCA GGA GCA GGA GGT GCT GGC GGT TAC 
AAAAAGAGA GGAGG Y> 

740 750 760 770 780 

***** **** 

GGT AGA GGT GCT GGT GCT GGA GCA GGA GCC GCT GCG GGT GCT GGA 
GRGAGAGAGAAAGAG> 

790 800 810 820 

* * ** * ** * * 

GCT GGA GGC TAC GGT GGT CAA GGT GGG TAC GGT GCC GGA GCA GGA 
AGGYGGQGGYGAGAG> 

830 840 850 860 870 

***** **** 

GCT GGT GCG GCT GCT GCT GCT GCT GGA GCA GGA TCT GGA GGC GCT 
AGAAAAAAGA GSGG A> 

880 890 900 910 

***** **** 

GGC GGT TAC GGT AGA GGT GCT GGT GCT GGA GCT GGA GCC GCT GCA 
G G Y G R G A G A G A ■ G A A A> 

920 930 940 950 960 

***** **** 

GGT GCA GGA GCA GGA GCT GGA AGC TAC GGT GGT CAA GGA TAC GGT 
GAGAGAGSYGGQGYG> 

970 980 990 1000 

***** **** 

GCC GGA GCA GGA GCT GGT GCT GCT GCA GCT GCA NNN NNN NNN NNN 
AGAGAGAAAAA 
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1010 1020 1030 1040 

******** 

NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN GGT GCA GGT GCA 

GAG A> 

1050 1060 1070 1080 1090 

** * ** * ** * 

GGT GCT GGA TAT GGT GGA CAA GGC GGA TAT GGT GCC GGA GCA GGA 
GAGYGGQGGYGAGAG> 

1100 1110 1120 1130 

** * ** * ** * 

GCT GGT GCG GCT GCT GCT GCT GGT GCA GGA GCT GGA GGT GCT GGT 
AGAAAAAGAGAGGAG> 

1140 1150 1160 1170 1180 

** * ** * ** * 

GGT TAC GGT AGA GGT GCT GGT GCT GGA GCT GGA GCC GCT GCA GGT 
GYGRGAGAGAGAAAG> 

1190 1200 1210 1220 

** * ** * ** * 

GCA GGA GCA GGA GCT GGA GGC TAC GGT GGT CAA AGT GGA TAC GGT 
AGAGAGGYGGQSGY G> 

1230 1240 1250 1260 1270 

*** * * * *** 

GCC GGA GCA GGA GCT GCT GCA GCT GCT GGA GCA GGA GCT GGA GGC 
AGAGAAAAAGAGAGG> 

1280 1290 1300 1310 

** * * * * * * * 

GCT GGT GGT TAC GGT GA GGT GCT GGT GCT GGA GCA GGA GCC GCT 
AGGYGRGAGAGAGA A> 

1320 1330 1340 1350 1360 

** * ** * ** * 

GCG GGT GCT GGA GCA GGA GCC GCT GCG GGT GCA GGA GCT GGA GGC 

AGAGAGAAAGAGAGG> 

1370 1380 1390 1400 

****** *** 

TAC GGT GGT CAA GGT GGG TAC GGT GCC GGT GCA GGA GCT GGT GCG 
Y G G . Q G G Y G A G A G A G A> 

1410 1420 1430 1440 1450 

** * ** * ** * 

GCT GCT GCT GCT GGA GCA GGA GCT GGA GGC GCT GGT GGT TAC GGT 
AAAAGAGAGGAGGYG> 

1460 1470 1480 1490 

****** ** * 

AGA GGT GCT GGT GCT GGA GCT GGA GCT GCT GCA GGC GCA GGA GCT 

RGAGAGAGAAAGAGA> 
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1500 1510 1520 1530 1540 

** * * * * * * * 

GGA GGC TAC GGT GGT CAA GGT GGA TAC GGT GCC GGA GCA GGA GCT 
GGYGGQGGYGAGAGA> 

1550 1560 1570 1580 

* * * * * * * * * 

GGT GCT GCT GCA GCT GCT GCA ACA GGA GCC GGA GGC GCT GGT GGT 
GAAAAAATGAGGAGG> 

1590 1600 1610 1620 1630 

* * * * * * * * * 

TAC GGT AGA GGT GCT GGT GCT GGA GCT GGT GCC GCT GCT GGG GCA 
YGRGAGAGAGAAAGA> 

1640 1650 1660 1670 

* * * * * * * * * 

GGT GCA GGC ACC GGT GGT GCT GGA TAT GGT GGA CAA GGC GGT TAT 
GAGTGGAGYG GQGGY> 

1680 1690 1700 1710 1720 

** * * * * * * * 

GGT GCC GGA GCA GGA GCT GGT GCG GCT GCT GCT GCT GGT GCA GGA 
GAGAGAGAAAAAGA G> 

1730 1740 1750 1760 

* * * * * * * * * 

GCA GGA GGT GCT GGT TAC GGT AGA GGT GCT GGT GCT GGA GCT GGA 
AGGAGYGRGAG A G A G> 

1770 1780 1790 1800 1810 

* * * **■••* * * * 

GCT GCT GCA GGT GCT GGA GCT GGA GCC GCT GCA GGT GCA GGA GCA 
AAAGAGAGAAAGAGA> 

1820 1830 1840 1850 

* * * * * * * * * 

GGA GCT GGA GGC TAC GGT GGT CAG GGT GGA TAC GGT GCC GGA GCA 
GAGGYGGQGGYGAG A> 

1860 1870 1880 1890 1900 

* * * ** * ** * 

AGA GCT GGT GCT GCG GCA GCT GCT GGA GCA GGA GCT GGA GGC GCT 
RAGAAAAAGAGAGGA> 

1910 1920 1930 1940 

* * * * * * * + * 

GCG GGT TAC AGT AGA GGT GGT CGT GCA GGA GCC GCT GGT GCT GGA 
AGYSRGGRAGAAGAG> 

1950 1960 1970 1980 1990 

* * * * * * ** * 

GCT GGA GCC GCT GCA GGT GCA GGA GCA GGA GCT GGA GGC TAC GGT 
AGAAAGAGAGAGGYG> 
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2000 2010 

* * ★ * * 

GGT CAA GGT GGA TAC GGT GCC 
G Q G G Y G A 

2040 2050 2060 

* * * * * 

GCT GCT GGT GCA GGA TCC GGA 
A A G A G S G 

2090 2100 

* * * * * 

GCT GGT GCT GGA GCC GCT GCA 
A G A G A A A 

2130 2140 2150 

* * * * * 

GGA GCA GGA GCT GGA GGC TAC 
G A G A G G Y 



2020 2030 

* * * * 

GGA GCA GGA GCT GGT GCT GCT GCA 
G A G A G A A A> 

2070 2080 

* * * * 

GGC GCT GGT GGT TAC GGT AGA GGT 
GAGGYGRG> 

2110 2120 

* * * * 

GGA GCT GGA GCC GCT GCA GGT GCT 
G A G A A A G A> 

2160 2170 

* * * * 

GGT GGT CAA GGT GGA TAC GGT GCC 
GGQGGYGA> 



2180 2190 

* * * * * 

GGA GCA GGA GCT GCT GCA GCT 
G A G A A A A 

2220 2230 2240 

* * * * * 

GGT TAC GGA AGA GGT GCT GGT 
G Y G R G A G 

2270 2280 

* * * * * 

GGA TAT GGT GCC GGA GCT GGA 
G Y G A G A G 

2310 2320 2330 

* * * * * 

GCG GGA GCC GGA GGC TAT GGC 
A G A G G Y G 

2360 2370 

* + * * * 

AGG TGT AGA TAC ACT GTT GCC 
R C R Y T V A 



2200 2210 

* * * * 

GCT GGA GCA GGA GCC GGA CGT GGA 
A G A G A G R G> 

2250 2260 

* * * * 

GCT GGA GGC TAC GGT GGA CAA GGA 
AGGYGGQG> 

2290 2300 

* * * * 

GCC GGT GCT GCT GCA GCT GCT GGA 
AGAAAAAG> 

2340 2350 

* * * ... .* 

G AC AAG GAG ATA GCC TGC TGG AGC 
D.KE IACWS> 

2380 2390 

* * * * 

TCC ACA ACA TCT CGT TTG AGT TCG 
STTSRLSS> 



2400 2410 2420 

* * * * * 

GCC GAA GCA TCT TCT AGG ATA TCG 
AEASSRIS 



2430 2440 
* * ★ * 

TCG GCG GCT TCC ACT TTA GTA 
S A A S T L V> 



2450 2460 2470 2480 

★ * * * * * * * * 

TCT GGA GGT TAC TTG AAT ACA GCA GCT CTG CCA TCG GTT ATT TCG 
SGGYLNTAALPSVI S> 
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2490 2500 2510 2520 2530 

** * * * ★ * * * 

GAT CTT TTT GCC CAA GTT GGT GCA TCT TCT CCG GTG ATC AGA CAG 
DL F AQVGAS S PV I RQ> 

2540 2550 2560 2570 

* * * * * * * * ■ * 

CGA AGT TTG ATC CAA GTT TTG TTG GAA ATT GTT TCT TCT CTT ATC 
RSLIQVLLEIVSSLI> 

2580 2590 2600 2610 2620 

* * * * * * *★ * 

CAT ATT CTC AGT TCT TCT AGC GTA GGA CAA GTC GAT TTC AGT TCG 
HILSSSSVGQVDFSS> 

2630 2640 2650 2660 

* * * ** * * * ★ 

GTT GGG TCG TCT GCT GCA GCT GTT GGT CAA TCC ATG CAA GTT GTA 
VGSSAAAVGQSMQVV> 

2670 2680 2690 2700 2710 

******** ** 

ATG GGC TAA ACAT GATGG TTCTC TCAAT TATGT ATTCT TTAAT TACCG 
M G *> 

2720 2730 2740 2750 2760 

********** 

CTAAG GTAGC AAAAT ATTGT AAAGT AAAGT TTTCT TACAA AATAA AAATT 



2770 2780 2790 

***** 

CTTTT CTGCA AAAAA AAAAA AAAAA AA 
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10 20 

* + ★ * 

TCT TAT GGA CCA TCC GTA TTT 
S Y G P S V F 

50 60 

* * * * 

TAT GGT GCA GGG GCC GGA GGT 

Y G A G A G G 

100 110 

* * ★ ★ 

GTC GGA GCC GGA GCT GGT ACT 

V G A G A G T 

140 150 

* * * ★ 

GGA TAT GGT GCC GGA AGC GCT 
G Y G A G S A 

190 200 

* ★ * * 

GCT GGA GCC GCT GCT GGT GCC 
A G A A A G A 

230 240 

* * * ★ 

GGA GCA GGA GCC GCT GCT GGT 
G A G A A A G 

280 290 

* * * * 

GGA GGA TAT AGT GCC GGA GTA 
G G Y S A G V 



30 40 

* ★ * * * 

TAC ACT CCT ACT TCA GCT GGA AGC 
YTPTSAGS> 

70 80 90 

* * * * * 

TTT GGA GCT GGA GCC TCT GCT GGT 
FGAGASAG> 

120 130 

* * * * * 

GTA GCA GGA TAT GGT GGA CAA GGA 
VAGYGGQG> 

160 170 180 

* * * * * 

GGA GGT TAT GGA AGA GGT ACT GGA 
G 6 Y 6 R 6 T 6> 

210 220 

* * * * * 

GGA GCA GGA GCC ACT GCT GGT GCC 
G A G A T A G A> 

250 260 270 

* ★ * * * 

GCC GGA GCA GGA GCA GGT AAT TCA 
A G A G A G N S> 

300 

* * * 

GGA GTT GGT GCT GCA GCT 
G V G A A A> 
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10 20 30 40 

* + ** * *** 

CT GCA GCT GCT GGA GGA GGT GCC GGA ACT GTT GGA GGT TAC GGA 
AAAGGGAGTVGGY'G> 

50 60 70 80 

* * * * * * * * * 

AGA GGT GCT GGT GTA GGA GCA GGT GCC GCT GCT GGT TTT GCG GCA 
RGAGVGAGAAAGFA A> 

90 ' 100 110 120 130 

* * * * * * * * * 

GGA GCT GGT GGT GCT GGA GGC TAC AGA AGA GAT GGA GGA TAC GGT 
GAGGAGGYRRDGGY G> 

140 150 160 

* * * * * * * 

GCT GGA GCA GGA GCT GGA GCT GCT GCA GCT G 
AGAGAGAAAAX> 



FIG. 2B 
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10 20 30 40 

★ * * * * * * * * 

GGT GCA GGA GGC TAT GGA AGA GGT GCT GGA GCT GGA GCT GCT GCA 
GAGGYGRGAGAGAA A> 



50 60 70 80 90 

* * * * * * * * * 

GTC GCA GGT GCA GAT GCT GGT GGC TAT GGA AGA AAT TAT GGT GCT 
VAGADAGGYGRNYG A> 

100 110 120 130 

* * * * * * * * * 

GGA ACC ACT GCT TAT GCA GGA GCC AGA GCC GGT GGT GCT GGA GGC 
GTTAYAGARAGGAG G> 



140 150 

* * * * 

TAT GGC GGA CAA GGA GGA TAT 

Y G G Q G G Y 

190 200 

* * * * 

GCT TCT GGA GCA GGA GCC GAT 
A S G A G A D 

230 240 

* * * * 

GTT GGT GCT GGA GCT GGA GCA 

V G A G A G A 

280 290 

* * * * 

GGA GGT GGG GCT GGA TCA GGA 
G G G A G S G 



160 170 180 

* * * * * 

TCT TCT GGA GCC GGT GCT GCT GCA 
SSGAGAAA> 

210 220 

* * * * * 

ATC ACT AGT GGA TAC GGA AGA GGT 
I T S G Y G R G> 

250 260 270 

* * * * * 

GAA ACT ATA GGT GCT GGA GGC TAT 
E T- I 6 A O 6 Y> 

300 310 

* * * * * 

GCA CGT GCG GCT TCA GCA TCC GGA 
A R A A S A S G> 



320 330 

* * * * 

GCT GGT ACT GGA TAT GGT TCG 
A G T G Y G S 

370 380 

* + * * 

GGA ATA AGT ACT TCT TCT GGC 
G I S T S S G 

410 420 

* * * * 

GCT GGA GGT TAT GCT TCA ACA 
A G G Y A S T 



340 350 360 

* * * * * 

TCT GGA GGT TAT AAC GTA GGT ACC 
SGGYNVGT> 

390 400 

* * * * * 

GCT GCA TCT AGC TAC TCT GTT TCT 
AASSYSVS> 

430 440 450 

* * * * * 

GGT GTT GGT ATT GGA TCC ACT GTT 
GVGIGSTV> 



460 470 
* * * * 

ACA TCC ACA ACA TCT CGT TTG 
T S T T S R L 



480 490 
* * * ★ * 

AGT TCT GCT GAA GCA TGT TCT AGA 
SSAEACSR> 
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500 510 
* * * 

ATA TCT GCT GCG GCT 
I S A A A 



520 

* * * 

TCC ACT TTA GTA TCT 
S T L V S 



530 540 
* „ * * 

GGA TCC TTG AAT ACT 
G S L N T> 



550 560 

* * * * 

GCA GCT TTA CCA TCT GTA ATT 
A A L P S V I 

590 600 

* * * * 

GCA TCA TCA CCC GGG GTA TCA 
A S S P G V S 

640 650 

* * * * 

TTG TTG GAA ATT GTT TCT TCT 
L L E I V S S 



570 580 

* * * * * 

TCG GAT CTT TTT GCC CAA GTT AGT 
SDLFAQVS> 

610 620 630 

* * * * * 

GGT AAC GAA GTT TTG ATT CAA GTT 
G N E V L I Q V> 

660 670 

* * * * * 

CTT ATC CAT ATT CTT AGT TCT TCT 
LIHILSSS> 



680 690 700 710 720 

* * * * * * * * * 

AGT GTA GGG CAA GTA GAT TTC AGT TCT GTT GGT TCA TCT GCT GCA 
SVGQVDFSSVGSSAA> 

730 740 750 760 

* * * * * * * * * 

GCC GTT GGT CAA TCC ATG CAA GTT GTA ATG GGT TAA AACA AAATG 
AVGQSMQVVM. G*> 

770 780 790 800 810 

********** 

GCTCT CTCTC TGTTA TATGC ATTCT GTAAT TTCTT CTAAA CTATT AAAAT 

820 830 840 850 860 

********** 

AATGT AATAA TTTCC TGCAT AAATA AAAAT ATTTT " TCTGC AAAAA AAAAA 



870 
* 

AAAAA 
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10 20 30 40 

* * * * * * * * * 

GCT GGA GCT GCT GCT GGT GCT GGA GGC TAT GAC GGA CAA GGA GGA TAT 
AGAAAGAGGYDGQGGY> 

50 60 70 80 90 

* * * * * * * * * * 

GGT GCT GGA GCA GGA GCT GCT GCA GCT GCT GGA GCA GGA GCC GGA AGC 
GAGAGAAAAAGAGAGS> 

100 110 120 130 140 

* * * * * * * * * 

GTT GGA GGT TAT GGA ACA GGT GCT GTA GCT GGA TCT GGA ACA GCT GCT 
VGGYGTGAVAGSGTA A> 

150 160 
* * * * * 

GGT GCA GGA GCC AGA GCT GGT 
G A G A R A G> 
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10 20 30 40 

★ * * * * * * * * 

GGA GCT GCT GCT GGT GCA GGA GCC GGA GCA GGT AGT ACA GGA GGC TTT 
GAAAGAGAGAGSTGGF> 

50 60 70 80 90 

* * * * ****** 

GGC GGA CAA GGA GGA TAT GGT GGC GGT GCA GGA GCT GCA GCT GCT GGA 
GGQGGYGAGAGAAAAG> 

100 110 120 130 140 

** * ** * ** * 

GCT TTT GCC GGA AGA GCT GGG GGT TAC GGA AGA GCT GCT GGA GCT GCG 
A .FAGRAGGYGRAAGA A> 

150 160 170 180 190 

********** 

GCT GGA ACT GGA GCT GCT GCT GGT GCA GGA GCC GGA GCT GGT AGT ACA 
AGTGAAAGAGAGAGST> 

200 210 " 220 230 240 

** ***** *** 

GGA GGC TTT GGC GGA CAA AGA GGA TAC GGT GCC GGC AGA AGT AAT GGA 
GGFGGQRGYGAGRSN G> 
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10 20 
* * * * 

TAT GGT GGA CAA GGC GGA TAT GGT 
YGGQGGYG 

50 60 70 

***** 

GCA GCC GCA GGA TAT GGA GCC GGT 

AAAGYGAG 

100 110 
* * * * * 

GGT TAT GGT GCC GGA GCT GGA GCT 
GYGAGAGA 
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30 40 
* * * * * 

GCT GGA GCA GGA GCT GGT GCT GCT 
AGAGAGAA> 

80 90 
***** 

GCT GGA GGA TAC GGT GGA CAA GCT 

AGGYGGQA> 

120 130 140 

* * * * * 

GGT AGT TCT GCA GGA AAT GCT TTC 
GS S A G N A F> 
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